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Quickest  Detection  Procedures  and  Transient  Signal  Detection 

Bruce  Broder 

Abstract 

This  dissertation  focuses  on  sequential  decision  procedures  to  detect  changes  in  the 
statistical  model  of  an  observed  random  process  when  these  changes  can  occur  at  unknown 
times.  In  the  disorder  problem,  the  samples  are  drawn  according  to  one  statistical  model 
until  some  unknown  time  after  which  all  the  samples  correspond  to  a  second  model.  In  the 
transient  problem,  the  model  then  reverts  back  to  the  first  one  after  some  finite  time. 

A  detection  procedure  known  as  Page’s  test  is  investigated  for  the  quick  detection  of 
the  disorder.  A  simple  asymptotic  measure  is  defined  and  an  analytic  formula  is  developed 
which  can  be  useful  in  evaluating  the  performance  of  Page’s  test  in  various  situations.  By 
examining  the  local  performance,  it  is  found  that  the  performance  is  directly  related  to  the 
efficacy  in  the  binary  hypothesis  testing  situation,  allowing  the  wealth  of  results  in  that 
context  to  be  transferred  to  Page’s  test. 

Because  Page’s  test  appears  to  detect  changes  in  distribution  quickly,  it  is  a  candidate 
for  transient  signal  detection.  It  is  used  to  detect  transients  of  arbitrary  spectral  shape  by 
developing  the  test  based  on  the  DFT  of  the  observations.  This  test  is  compared  against 
other  sequential  detection  schemes  and  it  is  shown  that  Page’s  test  performs  very  favorably 
and  is  easier  to  implement. 

The  Gabor  signal  representation  in  which  a  signal  is  modeled  by  exponentially  decay- 


ing  sinusoids  is  studied  for  its  use  in  transient  signal  detection.  A  detection  algorithm  is 
developed  using  the  coefficients  of  these  basis  functions.  The  focus  is  on  the  procedure 
for  computing  the  coefficients  and  the  effect  on  detection  performance.  A  sequential  algo¬ 
rithm  for  computing  the  coefficients  is  derived  so  that  a  real-time  detection  scheme  can  be 
implemented. 

Finally,  the  optimal  sample  size  for  the  detection  of  a  transient  signal  which  can  occur 
any  time  during  a  fixed  observation  period  is  determined.  The  optimization  is  done  Jissuming 
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Introduction 


1.1  Motivation 


This  dissertation  focuses  on  sequential  decision  procedures  to  detect  changes  in  the 
statistical  model  of  an  observed  random  process  when  these  changes  can  occur  at  unknown 
times.  In  classical  sequential  detection  procedures  such  as  those  considered  by  Wald  [9],  it 
is  assumed  that  all  the  time  samples  come  from  one  statistical  model  or  from  an  alternative, 
i.e.,  the  sequential  binary  hypothesis  situation.  The  problem  of  detecting  the  change  from 
one  statistical  model  to  a  second  model  had  not  been  considered  until  the  works  of  Page  [3] 
and  Shiryaev  [5-8]. 

Two  types  of  detection  problems  which  depend  on  the  time  element  are  considered  in 
this  work:  the  disorder  problem  and  the  transient  problem.  In  the  disorder  problem,  the 
samples  are  drawn  according  to  one  statistical  model  until  some  unknown  time  after  which 
aU  the  samples  correspond  to  a  second  model.  In  the  quickest  detection  formulation  of  this 
problem,  it  is  desired  to  detect  the  time  of  change  w’ith  minimum  delay  for  a  fixed  mean  time 
between  false  alarms.  This  is  in  contrast  to  the  approach  in  the  classical  detection  problem 
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where  the  probability  of  detection  is  maximized  for  a  fixed  probability  of  false  alarm.  The 
quickest  detection  problem  was  first  considered  by  Page  [3]  in  the  context  of  quality  control 
in  a  manufacturing  process  where  it  is  desired  to  detect  when  the  machinery  fails  and  the 
items  subsequently  produced  are  defective.  This  model  also  has  applications  in  the  area  of 
signal  processing.  For  example,  it  could  model  a  sonar  observation  of  a  submarine  which 
suddenly  turns  on  its  engines.  In  the  transient  problem,  the  scimples  again  come  from  one 
model  '.»ntil  an  unknown  time,  after  which  the  samples  are  drawn  from  a  second  model  which 
lasts  for  a  known  or  unknown  but  finite  number  of  samples.  Then,  the  samples  are  again 
drawn  according  to  the  first  statistical  model.  In  this  case,  the  performance  of  a  sequential 
detection  procedure  is  evaluated  according  to  the  probability  of  detecting  the  transient  and 
the  mean  time  between  false  alarms.  The  transient  model  is  also  applicable  to  a  variety  of 
signal  processing  problems. 


1.2  Overview 


The  body  of  this  dissertation  is  divided  into  four  chapters  as  summarized  below.  With 
the  exception  of  Chapter  3,  which  uses  some  material  from  Chapter  2,  each  of  the  chapters 
can  be  read  and  understood  independently. 

In  Chapter  2,  a  class  of  detection  procedures  is  investigated  in  the  context  of  non- 
parametric  quickest  detection.  The  procedures  are  a  generalization  of  a  test  first  studied 
by  Page  in  1954  [3]  to  detect  a  change  in  the  distribution  of  random  variables  observed 
sequentially  in  time.  The  goal  is  to  detect  the  unknown  time  of  change  with  minimum 
delay  while  insuring  infrequent  false  alarms.  Originally,  Page’s  test  was  implemented  with 


2 


the  log-likelihood  ratio  for  the  two  distributions  under  consideration.  In  this  work,  the  test 
is  generalized  by  implementing  it  with  other  nonlinearities  which  do  not  depend  on  the 
distributions.  The  linear  detector,  sign  detector  and  dead  zone  limiter  are  investigated  in 
detail,  but  the  theoretical  results  apply  to  arbitrary  nonlinearities. 

Methods  for  generating  performance  curves  for  Page’s  test  are  discussed.  Based  on 
the  performance  curves,  a  simple  asymptotic  measure  is  defined  and  an  anzilytic  formula  is 
developed  which  can  be  useful  in  evaluating  the  performance  of  Page’s  test  using  different 
nonlinearities  for  various  noise  distributions.  Also,  by  e.xamining  the  local  performance 
of  Page’s  test,  it  was  found  that  the  performance  is  directly  related  to  the  efficacy  in  the 
classical  binary  hypothesis  testing  situation.  Thus,  the  wealth  of  results  on  using  memory¬ 
less  nonlinearities  in  classical  detection  theory  can  be  transferred  to  the  quickest  detection 
problem  using  Page’s  test. 

Because  Page’s  test  detects  changes  in  distribution  quickly,  one  might  expect  that  it 
would  be  useful  for  detecting  transient  signals  which  occur  at  unknown  times.  In  Chapter 
3,  the  performance  of  Page’s  test  is  evaluated  for  transient  signal  detection  using  the  prob¬ 
ability  of  detection  and  the  mean  time  between  fzJse  alarms  as  the  performance  criteria. 
By  developing  a  test  based  on  the  Discrete  Fourier  Transform  of  the  observations,  Page’s 
test  can  be  used  to  detect  transients  of  arbitrary  spectral  shape.  This  test  is  compared 
against  other  sequential  detection  schemes  including  one  proposed  by  Wolcin  [10].  The  re¬ 
sults  indicate  that  Page’s  test  performs  as  well  as  the  other  tests  in  almost  all  the  examples 
considered  and  it  is  also  the  easiest  to  implement. 

In  Chapter  4,  a  signal  representation  due  to  Gabor  [2]  is  studied  because  of  its  potential 
use  for  transient  signal  detection.  This  idea  was  first  suggested  by  Friedlander  and  Porat 
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[l].  Essentially,  the  transient  is  modeled  by  a  group  of  exponentially  decaying  sinusoids 
with  arbitrary  arrival  times.  By  decomposing  a  signal  into  a  set  of  such  basis  functions 
and  finding  the  corresponding  coefficients,  a  detection  algorithm  is  developed.  This  chapter 
focuses  on  the  two  methods  of  computing  the  coefficients,  the  biorthogonal  function  method 
used  in  [l]  and  maximum  likelihood  estimation.  It  is  demonstrated  that  detectors  based 
on  the  maximum  likelihood  estimates  outperform  those  based  on  the  other  method.  A 
sequential  algorithm  for  computing  the  maximum  likelihood  estimates  is  derived  so  that  a 
real-time  detection  scheme  can  be  implemented. 

Often  when  a  real-time  signal  processor  is  built  in  hardware,  the  incoming  data  is 
buffered  into  fixed-size  blocks  to  facilitate  the  implementation.  In  Chapter  5,  the  optimal 
block  size  for  transient  signal  detection  is  determined.  This  is  motivated  by  the  work  of 
Pelkowitz  and  Schwartz  [4]  in  determining  the  optimal  block  size  for  the  quickest  detection 
problem.  For  the  present  situation,  the  sample  size  is  optimized  so  as  to  maximize  the 
probability  of  detecting  a  transient  that  can  occur  at  any  time  during  the  integration  period. 
The  two  optimization  criteria  considered  are  similar  to  those  found  in  [4],  the  mean  criterion 
and  the  minimax  criterion.  In  the  mean  criterion,  the  sample  size  is  chosen  to  maximize 
the  mean  probability  of  detection,  assuming  that  the  transient  is  equally  likely  to  arrive 
any  time  during  the  integration  period.  In  the  minimax  criterion,  the  optimal  sample  size 
maximizes  the  probability  of  detection  for  the  worst  possible  arrival  time. 

Finally,  the  original  results  of  this  dissertation  are  summarized  in  Chapter  6  along  with 
suggested  areas  for  further  research. 
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2 


Nonparametric  Quickest  Detection 

and 

The  Performance  of  Page’s  Test 


2.1  Introduction 


VV'hen  a  change  in  the  probability  laws  of  a  random  process  occurs,  a  disorder  is  said 
to  take  place.  The  time  at  which  .le  disorder  occurs  is  called  the  disorder  time  and  is 
either  an  unknown  parameter  or  a  random  variable  with  a  known,  assumed,  or  unknown 
distribution.  In  the  simplest  case,  a  discrete-time  random  process,  A'j,i  =  1,2,3,...  is 
considered  [36,37].  Before  the  disorder  time,  m,  the  random  variables  A'j,  A'o, . .  • 
are  independent  and  identically  distributed  (i.i.d.)  with  distribution  function  Fq.  and  after 
the  disorder,  the  random  variables  A'nn-^m+ii-- •  i.i.d.  with  distribution  Fj  and  are 
independent  of  Xi,X2, . . .  This  model  was  originally  used  for  quality  control  in 

the  manufacturing  of  a  product  where  it  is  desired  to  detect  when  the  machinery  fails  and 
the  items  subsequently  produced  are  defective  [31].  The  same  model,  however,  can  also 
be  used  for  signals  in  a  radar  or  sonar  observation.  For  example,  with  continuous-time 
observations,  consider  a  Brownian  motion,  B<,  whose  drift  changes  at  the  disorder  time  to 
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[37-39].  Specifically,  let  Bt  satisfy  the  stochastic  differential  equation 


dBt  =  u{t  -  to)dt  +  drv( 


where  wt  is  a  Wiener  process  and  u{t)  is  the  unit  step  function 


for  t  >  0 
otherwise. 


This  might  be  the  situation  when  looking  at  a  radar  return  with  no  targets  present,  when, 
at  some  unknown  time,  a  target  comes  over  the  horizon  into  the  view  of  the  radar. 


Detecting  the  disorder  can  be  thought  of  as  a  classical  decision  problem  with  an  infi¬ 
nite  number  of  hypotheses,  one  hypothesis  for  each  possible  disorder  time.  In  this  sense, 
detecting  the  disorder  is  more  akin  to  an  estimation  problem  where  a  combined  “detector- 
estimator”  can  either  declare  that  a  disorder  is  not  present  or  give  an  estimate  of  the  disorder 
time  when  declaring  the  disorder  present. 

There  are  two  distinct  algorithmic  approaches  to  the  disorder  problem,  on-line  and  off¬ 
line  processing  [2].  In  the  latter,  the  random  process  is  observed  over  a  finite  time  interval 
[/l,t2]  or  only  a  finite  number  of  samples  are  used  [3,7,17,32,34,35,40].  The  problem  is  then 
one  of  classical  hypothesis  testing, 

IIq  ;  No  disorder  jccurs  in  [fj.  <>] 
versus 

III  '■  A  disorder  occurs  in  [fi,<2]- 

In  this  case,  the  performance  is  measured  in  terms  of  the  probability  of  detection  versus 
the  probability  of  false  alarm.  Often,  in  order  to  obtain  tractable  results,  the  number  of 
samples  is  made  asymptotically  infinite  [13-16].  In  on-line  processing,  the  observations  are 

t 


processed  sequentially  and  a  finite  observation  interval  is  not  assumed.  Here,  performance 
is  measured  in  terms  of  delay  in  detection  when  the  disorder  occurs  versus  the  mean  time 
between  false  alarms  when  the  disorder  is  absent.  When  this  approach  is  adopted,  the 
disorder  problem  is  termed  the  “quickest  detection”  problem  to  indicated  the  performance 
criteria  used  for  the  sequential  processing.  This  approach  is  taken  in  [1,4,21,25,31,36-39] 
and  will  be  used  throughout  this  chapter. 

There  are  two  probabilistic  approaches  to  looking  at  the  disorder  problem,  the  Bayesian 
and  non-Bayesian  or  maximum  likelihood  approach  In  the  Bayesian  approach  [7,40], 
which  was  first  studied  extensively  by  Shiryaev  [36,37],  a  prior  distribution  is  assumed  for 
the  disorder  time  and  one  attempts  to  minimize  some  cost  functional.  Often  the  geometric 
and  exponential  distributions  are  assumed  for  the  discrete  and  continuous  times  cases, 
respectively. 

In  the  non-Bayesian  or  maximum  likelihood  approach  to  the  disorder  problem,  no  prior 
distribution  is  assumed  for  the  disorder  time.  In  off-line  processing  this  is  equivalent  to 
assuming  a  uniform  prior  distribution  over  the  observation  interval,  resulting  in  a  detector 
which  computes  the  likelihood  function  of  the  data  for  all  possible  disorder  times.  In  on-line 
processing,  the  disorder  time  is  considered  an  unknown  parameter. 

This  work  will  consider  only  the  non-Bayesian  formulation  of  the  quickest  detection 
problem.  The  situation  is  as  follows.  Consider  the  sequential  observation  of  independent 
random  variables  Xi,X2,...  such  that  Xi, . . .  ,Xfn^i  have  distribution  function  Fq  while 
•  •  •  have  distribution  function  Fj  ^  Fq.  The  two  distributions  may  not  be 
known  exactly  and  the  disorder  time  m  is  unknown.  In  this  setting,  it  is  necessary  to 
define  appropriate  performance  criteria  which  do  not  depend  on  the  true  disorder  time. 
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As  formulated  by  Lorden  [25]  for  the  discrete-time  ca^e,  the  criteria  we  use  are  the  mean 
time  between  false  alarms  and  the  expected  delay  which  are  defined  as  follows.  Let  N  be 
the  stopping  variable  associated  with  the  sequential  detection  algorithm.  Let  Pm  denote 
the  probability  measure  when  the  disorder  time  is  m  and  let  Em  denote  the  corresponding 
expectation.  Similarly,  let  Pq  and  £0  correspond  to  the  case  in  which  Xi,X2,---  all  have 
distribution  Fq.  Define  the  worst  case  expected  delay  in  detection  D  according  to 

D  =  sup  ess  sup  £rn{niax(0,iV  -  m  +  l)l.£m-i}» 
m>l 

where  Pm  is  the  sigma  algebra  generated  by  {Aj, . . . ,  Am}-  In  other  words,  D  is  the 
smallest  value  such  that  for  every  m  =  1,2, . .. 

Em{max(0,  A  -  m  +  I)]/;?!-!}  <  D 

almost  surely  under  Pq.  The  goal  of  the  quick  detection  of  the  change  is  in  competition  with 
the  desire  for  a  large  mean  time  between  false  alarms  T  which  is  defined  by  T  =  Eo{N). 
Thus,  the  performamce  of  a  given  algorithm  is  specified  by  the  pair  {T,D).  According  to 
these  criteria,  the  optimal  detector  would  have  the  minimum  expected  delay  D  for  a  fixed 
mean  time  between  false  alarms  T. 


Using  Lorden ’s  criteria  for  the  discrete-time  case  of  i.i.d.  observations  and  a  change  in 
distribution,  we  will  investigate  one  class  of  detectors  which  is  sufficiently  rich  to  include 
the  optimal  detector.  In  particular,  we  look  at  a  generalization  of  a  control  chart  procedure 
for  a  manufacturing  process  which  was  studied  extensively  by  Page  in  1954  [31].  Define  a 
cumulative  sum  statistic  5„  according  to 


=  E<7(X,) 

»=1 

Sq  =  z. 


(1) 
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Here,  g{»)  is  some  function  and  z  is  a  parameter  called  the  initial  score.  For  a  given  function 
ff  and  threshold  h,  let  the  stopping  vaaiable  N  be  defined  as 

N  =  inf{ra  >  1  :  S„  >  h).  (2) 

The  test  procedure  given  by  (1)  and  (2)  has  been  studied  only  when  g(»)  is  the  log-likelihood 
ratio,  g(x)  =  log[dF(xl0i)/dF(xj0o)].  For  this  case,  the  detector  has  been  given  various 
names  in  the  literature  including  Page’s  test,  the  cum-sum,  cusum,  or  cumulative  sum  test, 
and  Hinkley’s  detector.  We  will  refer  to  the  test  with  arbitrary  nonlinearities  also  as  Page’s 
test. 


When  Page’s  test  is  implemented  with  the  log-likelihood  ratio,  it  is  equivalent  to  a 
sequential  implementation  of  the  maximum  likelihood  detector.  The  log-likelihood  function 
of  the  observations  .  ,A'n)  is 


m  — 1  n 

/(A'i,...,.V„)  =  £  log{r//’o(A'.)]+  log[dFt(X,)] 
1=1  «=m 

which  can  be  rewritten 


/(.Yi,...,A'„)=  J^log 

i=m 


'dri(x,r 

.dFQ(X,). 


+  ^log[dFo(.V.)]. 
1=1 


Since  the  last  term  does  not  depend  on  the  disorder  time,  the  maximum  of  the  likelihood 


function  is  obtained  by  using  (l)-{2)  with  the  log-likelihood  ratio  and  initial  score  Sq  =  0. 
Lorden  proved  that  for  T  >  Tot  detector  minimizes  the  delay  in  the  class  of  <dl  detectors 
of  form  (l)-(2),  asymptotically  as  Tq  tends  to  infinity.  In  [29],  Moustakides  extended  this 
result  to  the  non-asymptotic  case. 


Dyson  [9]  investigated  the  locally  optimal  or  weak  signal  version  of  Lorden ’s  result. 
Specifically,  let  Pq  be  the  distribution  of  the  i.i.d.  observations  before  the  disorder  and 
let  Pg  be  the  distribution  of  the  i.i.d.  observations  after  the  disorder.  We  assume  that  9 
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parameterizes  a  family  of  distributions  {Pq,  &  >  0}  which  includes  Pq  and  limg_o  Pff  =  Po- 
Let  Pff(x)  be  the  probability  density  function  corresponding  to  Pg  which  is  assumed  to  exist 
and  to  be  differentiable  with  respect  to  6.  Then,  for  small  0  the  optimal  detector  in  the 
sense  of  Lorden  is  Page’s  test  with  g{x)  given  by 

where  B{9)  is  of  0{9^). 


In  this  chapter  we  investigate  the  performance  of  Page’s  test  for  arbitrary  nonlinearities 
g{x)  in  the  local  and  non-local  situations.  In  particular,  we  will  look  at  nonlinearities  which 
do  not  require  knowledge  of  the  probability  distributions  before  and  after  the  disorder.  In 
that  sense,  the  results  have  applications  in  nonparametric  quickest  detection  situations. 
Specifically,  we  will  examine  Page’s  test  when  implemented  with  the  linear  detector,  the  sign 
detector,  or  the  dead  zone  limiter.  The  analysis  for  the  linear  detector  can  be  applied  to  a 
large  class  of  nonlinearities  by  using  a  transformation  of  random  variables.  The  sign  detector 
and  dead  zone  limiter  are  a  special  cases  of  a  class  of  detectors  for  which  the  performance 
can  be  obtained  in  closed  form.  It  should  be  noted,  however,  that  the  performance  of  Page’s 
test  with  a  particular  nonlinearity  will  depend  on  the  actual  distributions  of  the  observations 
so  that  the  detector  is  not  nonparametric  in  the  most  strict  sense. 

Ail  previous  work  in  nonparametric  detection  in  the  disorder  problem  has  been  done 
in  the  off-line  setting.  In  [33],  Page  uses  a  fixed  sample  size  (off-line)  version  of  (l)-{2) 
implementing  the  sign  detector  and  evaluates  the  performance  in  several  examples.  Tests 
based  on  the  ranks  and  signs  of  the  data  are  examined  in  [3,17].  Other  nonparametric 
tests  include  extending  the  classical  tests  such  as  the  Kolmogorov-Smirnov  test  and  the 
Cramer-Von  Mises  test  to  the  disorder  problem  [8].  These  tests  look  for  a  change  in  the 
empirical  cumulative  distribution  function  to  determine  that  a  disorder  has  occurred  and 
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their  performance  does  not  depend  on  the  true  distributions.  Also  in  an  off-line  framework, 
Pettitt  [34,35]  uses  an  extended  Mann-Whitney  test  which  uses  only  the  signs  of  the  dif¬ 
ferences  of  the  observations  and  is  aJso  truly  nonparametric.  The  only  related  work  in  the 
sequential  framework  is  the  analysis  of  Page’s  test  with  observations  drawn  from 
done  by  Basseville  [1],  but  the  results  were  not  applied  in  the  nonparametric  framework. 


Because  (1)  involves  sums  of  a  memoryless  nonlinear  function  of  the  observations,  one 
might  expect  there  to  be  a  relationship  to  the  classical  detection  problem 


Hq  :  A'l,  A'2, . . . ,  A'n  are  i.i.d.  Fq  distributed 
versus 

Hi  :  A'l,  A'2, . . . ,  A'n  are  i.i.d.  Fi  distributed 


using  Sn  as  the  test  statistic.  For  the  case  when  Fi{x)  —  Fq{x  -  0)  corresponding  to  a 
shift  in  the  mean  of  the  distribution  Fq,  the  efficacy  £  of  the  detector  is  a  measure  of  the 
detector  performance  in  the  classical  problem  for  a  vanishingly  small  shift  in  mean  and 
asymptotically  large  sample  size.  In  Section  5,  it  will  be  shown  that  the  performance  of 
Page’s  test  in  the  weak  signal  case  is  related  to  the  detector  efficacy.  For  the  memoryless 
nonlinearity  g{*)  the  efficacy  is 


£  = 


lim  lim 

n— CXI 


n  Var[5„l/fo] 


^  (£(g^(A')l//o])^ 

Var[ff{A')|//o]  ■ 

Larger  values  of  the  efficacy  indicate  better  detector  performance.  The  nonlinearity  which 
maximizes  the  efficacy,  the  locally  optimal  nonlinearity  gio(x),  can  be  shown  to  be  [19,23,28] 


IM 

/o(^) 
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where  /o  is  the  probability  density  function  for  Fq.  In  this  case,  the  efficacy  is  equal  to 
Fisher’s  Information  If  [22] 


oo 

e  =  i}=  J  /o(x) 

— oc 


/o(x) 


2 

dx. 


The  remainder  of  this  chapter  is  organized  as  follows.  In  Section  2,  a  method  is  pre¬ 
sented  to  compute  performance  curves  for  Page’s  test  and  several  examples  are  given.  The 
performance  curves  suggest  an  asymptotic  performance  measure  which  is  defined  and  an¬ 
alyzed  in  Section  3.  Properties  of  this  measure  and  examples  of  its  use  are  presented  in 
Section  4.  The  performance  of  Page’s  test  in  the  weak  signal  situation  is  analyzed  in  Section 
5  and  conclusions  are  presented  in  Section  6. 


2.2  Performance  Curves 


VVe  have  seen  that  the  mean  time  between  false  alarms  is  T  =  Eo{N).  Although  not 
true  in  general,  the  expected  delay  given  by  (1)  is  simply  D  =  Ei{N)  for  Page’s  test  [2.25]. 
Thus,  the  computation  of  both  of  these  quantities,  T  and  D,  simplifies  to  determining  the 
average  sample  number  (ASN)  of  Page’s  test  when  all  the  observations  are  independent  and 
identically  distributed  with  distribution  F,  that  is  when  no  change  in  distribution  occurs. 
Several  approaches  have  been  taken  to  compute  the  ASN  for  Page’s  test.  Some  have  used 
Monte  Carlo  simulations  as  in  [l],  but  this  will  be  computationally  expensive  if  results  are 
desired  for  a  large  mean  time  between  false  alarms.  In  [6],  the  interval  between  0  and  h  is 
discretized  and  the  resulting  random  sequence  is  analyzed  using  Markov  chains.  For  fixed 
values  of  h,  this  approach  seems  promising.  However,  in  order  to  generate  performance 
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curves,  pairs  of  (T,  D)  are  required  for  many  values  of  h.  The  Markov  chain  approach 
requires  the  inversion  of  a  matrix  which  docs  not  permit  a  recursive  implementation  ais  h 
increases.  Our  approach  involves  disci ctizing  the  integral  equations  which  are  required  to 
compute  the  ASN  and  solving  a  system  of  linear  equations.  This  approach  has  been  used 
in  [12]  and  we  will  propose  an  alternate  technique  which  adlows  recursive  computation  as  h 
increases. 


To  determine  the  performance  curves  indicating  expected  delay  versus  mean  time  be¬ 
tween  false  alarms  for  Page’s  test,  it  is  useful  to  reformulate  the  sequential  procedure  in 
(l)-(2).  It  is  not  difficult  to  see  that  the  following  test  is  equivalent  [31].  Define  the  test 
statistic  5„  recursively  by 

5„  =  max{0,5n_i  +  giXn)} 


So  =  2, 

and  the  stopping  variable  N  is  given  by 


(3) 


N  =  inf{n  :  5n  >  h). 


(4) 


This  formulation  can  be  thought  of  as  a  repeated  Wald  sequential  test  with  boundaries  0 
and  h  in  the  following  sense.  Let  M  be  the  stopping  variable  for  the  Wald  test  given  by 


M  =  inf{n  :  Sn  <  0  or  Sn  >  h} 


(5) 


where  Sn  is  defined  in  (1).  If  the  Wald  test  hits  the  lower  boundary  0  first,  then  another 
Wald  test  is  performed,  and  so  on,  until  such  time  that  a  Wald  test  exceeds  the  threshold  h. 
See  Figure  1  for  an  illustration  of  the  equivalent  implementations  of  Page’s  test  in  (2)  and 
(4). 


For  convenience,  we  parameterize  the  distributions  as  follows. 


M 


(a) 


Figure  1.  Two  equivalent  implementations  of  Page’s  test.  In  (a),  the  cumulative  sum 
statistic  Sn  is  compared  to  the  previous  minimum  value  of  the  statistic.  In  (b),  a  two-sided 
sequential  test  is  implemented  in  which  the  test  statistic  5,,  is  reset  to  zero  if  it  would  have 
fallen  below  zero. 


15 


where  Bq  and  6\  are  some  parameters  such  as  the  means  of  the  random  variables  before  and 
after  the  disorder.  Denote  by  J^{z,d)  the  average  sample  number  of  test  (4)  with  initial 
score  Sq  •=  z  when  the  {A',}  are  i.i.d.  F{6)  distributed,  i.e., 

A^(^,0)  =  £[A|So  =  ^,^]. 


Similarly,  let  M{z,B)  be  the  ASN  of  the  Wald  test  (5)  and  let  V{z,B)  be  the  operating 
characteristic  (OC)  of  the  same  W'ald  test,  that  is,  the  probability  of  hitting  the  lower 
boundzu-y  first 

V{z,6)  =  P{Sm  <Q}. 


Then,  the  ASN  of  Page’s  test  can  be  computed  from  [31] 

The  mean  time  between  false  alarms  T  is  the  average  sample  number  of  test  (4)  with  initial 
score  5o  =  0  and  Xi  ~  £(0o). 


T  =  A^(O,0o)  = 


1-V{Q,Bq) 


(6) 


Recall  that  the  delay  in  detection  D  as  defined  by  Lorden  is  in  effect  the  worst  possible 
expected  delay.  Using  the  formulation  in  (4)  (see  Figure  1(b)),  it  is  obvious  that  the  worst 
case  delay  occurs  when  the  test  statistic  is  zero  at  the  disorder  time,  Sm  =  0,  since  then 
ihe  test  statistic  has  the  furthest  to  travel  when  the  disorder  occurs.  A  proof  of  this  result 
can  be  found  in  [2,25].  Thus,  the  delay  is 


D=A"(0,fli)  = 


M(O,0i) 


(7) 


In  this  way,  calculation  of  the  performance  curves  of  D  vs.  T  only  requires  calculating  the 
ASN  and  OC  of  a  Wald  test  with  initial  score  5o  =  0  for  the  two  possible  distributions. 


When  the  linearity  g{x)  =  i  is  used,  the  average  sample  number  and  operating  charac¬ 
teristic  of  the  Wald  test  with  initial  score  z  satisfy  the  following  Fredholm  integral  equations 
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of  the  second  kind  [11,31,32,44]: 


n 

M{z,9)  =  1  +  J  M{x,e)dF{x  -  2\B) 

0 

h 

V{z,e)  =  F{-z\d)  +  J  V{x,e)dF{x  -  z\9) 


6  [0,/i] 


No  analytical  solutions  can  be  found  for  these  equations  and  we  must  resort  to  numerical 
techniques.  We  replace  these  integrals  with  an  approximation  according  to 

__  J  _ 

M{z,e)  =  1  +  ^  WjM{xj,9)f{Xj  -  z\e) 

J 

V{z,9)  =  Fi-z\9)  +  ^  WjV{xj,9)fixj  -  z\9), 
j=i 

where  M(z,9)  and  Viz, 9)  are  the  solutions,  and  the  weights  {wj]  and  points  {ij}  are 
chosen  according  to  some  rule.  In  [12],  this  approach  is  taken  where  the  points  {ij}  are 
the  “Gaussian  points”  or  the  roots  of  the  Legendre  polynomial  and  where  the  weights  {wj} 
are  the  Gaussian  coefficients  [18].  Now,  by  evaluating  (9)  at  the  points  {xj}  which  are 
substituted  in  for  z,  we  obtain  two  systems  of  linear  equations  which  can  be  solved  by 
ordinary  techniques  to  find  the  appro.ximate  solutions  of  the  original  integral  equations. 
By  substituting  these  values  back  in  the  right  hand  side  of  (9),  j\d{z,9)  and  Viz, 9)  can 
be  evaluated  for  any  values  of  z  of  interest,  and  in  particular  for  z  =  0.  This  method  of 
solution  is  called  the  Nystrom  appro.ximation  method  for  the  solution  of  Fredholm  integral 
equations  of  the  second  kind  [5].  The  matrix  equations  to  be  solved  are  of  the  form 


(I  -  A)u  =  V 


where  the  matrix  (I  —  A)  has  entries  (I  -  A)ij  =  S,j  —  Wj/ix^  -  Xj\9).  Because  of  the  intro¬ 
duction  of  the  weights,  this  matrix  has  no  special  properties  that  allow  efficient  solution  of 
the  linear  equations.  Thus,  they  must  be  solved  by  a  technique  such  as  Gaussian  elimination 
which  requires  that  the  number  of  multiplications  for  the  solution  is  -t-  0(  J^)  [24]. 


Instead  of  using  the  Gaussian  points  and  weights,  we  can  use  a  simple  rectangular 
approximation  to  the  integrals  on  the  right  side  of  (9).  Again,  this  results  in  two  sets  of 
linear  equations  of  the  form  (10)  where  (I  —  A),j  =  S,j  —  A/(ii  -  Xj\6)  for  A  =  h/J.  This 
matrix  is  Toeplitz  and  there  exists  an  efRcient  algorithm  related  to  the  celebrated  Levinson’s 
algorithm  for  the  solution  of  the  equations  where  the  number  of  multiplications  is  3J^+0{J) 
[27].  Moreover,  the  algorithm  for  the  solution  is  a  recursive  one  which  computes  the  solution 
for  lower  order  systems  to  arrive  at  the  solution  for  the  desired  order.  In  other  words,  in 
solving  for  V{0,6)  and  M(Q,6)  for  a  given  h,  the  solution  is  also  computed  for  h,  =  iA  for 
i  =  1,...,  J  —  1  with  no  extra  computation.  In  contrast  with  this,  the  Nystrom  method 
requires  that  the  linear  equations  must  be  solved  anew  for  each  different  value  of  h  because 
the  matrix  is  very  general. 

The  results  in  [12]  indicate  that  in  certain  situations  only  fifteen  points  may  be  re¬ 
quired  in  the  Nystrom  method  for  accuracy  to  six  decimal  places.  It  is  well  known  that 
many  more  points  are  required  for  accurate  solutions  using  the  rectangular  approximation. 
Our  experimental  results  show  that  if  very  accurate  solutions  are  required,  the  Nystrom 
method  is  preferred.  However,  as  pointed  out  in  [12],  for  1  —  V(0,9)  <  10~®  the  number 
of  points  may  need  to  be  increased.  We  have  found  that  fifty  to  one  hundred  points  may 
be  needed  to  obtain  a  reasonable  solution  for  large  threshold  values  h  while  the  less  ac¬ 
curate  approach  using  the  rectangular  approximation  may  only  require  250  points.  Since 
the  Nystrom  method  must  be  repeated  many  times  to  specify  the  performance  curve  by 
calculating  pairs  of  {T,D}  for  many  values  of  h,  it  may  be  worthwhile  to  consider  the  faster 
approach  using  the  rectangular  approximation.  Although  speed  is  not  necessarily  a  factor 
in  calculating  the  curves,  they  may  take  mi  iy  hours  of  computation  time  for  large  values 
of  h  for  the  faster  method,  so  that  one  may  want  to  consider  speed  over  accuracy. 
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Both  algorithms  were  implemented  on  a  computer.  It  was  found  that  the  solution  for 
V{0,6)  became  inaccurate  as  h  grew,  to  the  extent  that  for  h  large  enough  V{0,d)  was 
computed  to  be  larger  t.han  one.  Since  what  is  required  to  calculate  Af{0,9)  is  actually 
Q(O|0)  =  1  -  7^(0, 9),  the  integral  equation  for  C2(2:,  6)  was  utilized.  Substituting  1  —  Q(  z,  9) 
into  the  integral  equation  for  V{z,9)  yields 

h 

Q(zl0)  =  1-  F{h-  z\9)  +  J  Q{x\d)dF(x  -  z\9). 

0 

Q(O10)  Wets  computed  according  to  both  algorithms  and  the  solutions  were  better  behaved 
for  larger  values  of  h. 

The  procedure  above  could  theoretically  be  applied  when  g  is  an  arbitrary  nonlinearity 
by  making  the  transformation  of  random  variables,  Y  =  5(.V),  so  that  the  resulting  detector 
is  again  linear. 


When  the  nonlinearity  g  takes  only  the  values  in  {-1,0, +  1},  the  solution  for  A'(O,0) 
is  known  exactly  in  closed  form  from  results  in  random  walk  theory.  Such  a  nonlinearity 
will  be  called  a  random  walk  nonlinearity.  Examples  of  such  nonlinearities  are  the  dead 
zone  limiter  where 

{-1,  for  I  <  —  d 
0,  for  -d  <  X  <  d 
1,  for  I  >  d 

and  the  sign  detector  which  is  obtained  by  setting  d  =  0.  For  the  boundary  ft  a  positive 


integer,  the  ASN  of  Page’s  test  with  a  random  walk  nonlinearity  is  [26] 

fMi  +  M 

-  if  P  =  g 

P  +  <7 


where  p  =  p(9)  and  q  =  q{9)  are  defined  by 

P=  P{9{.\)=1\9} 


q=  P{g[.\)  =  -l\e]. 
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Thus,  the  performzmce  curves  are  found  by  computing  T  =  A^(O,0o)  D  =  A/'(O,0i)  for 
a  range  of  boundaries  h  =  1,2,.... 


To  demonstrate  the  use  of  the  performance  curves,  the  following  disorder  problem  is 
considered.  The  distributions  before  and  after  the  disorder  time  are  given  by  the  symmetric 
signal  in  additive  noise  situation  where  F(i|0o)  =  ■^"0(2^  +  =  Fq{x  -  6)  for 

0  >  0.  The  noise  distributions  considered  are  the  Gaussian  with  density 

/o(^)  = 

v'lTTa- 

the  Laplace  with  density 

/o(t)  =  (2fT')“*''-e.xp(-v/2|x|/CT), 


and  the  Gaussian-Gaussian  mixture  with  ^otimiv 


foil}  =  (1  - 


-j 

The  Gaussian-Gaussian  mixture  density  is  the  first  two  terms  in  Middleton's  Class  A 
model  [19,43]  and  has  been  used  to  model  impulsive  noise.  The  parameter  t  indicates 
the  degree  of  contamination  and  is  typically  in  the  range  (0,0.25).  The  relative  strengt  ' 
of  the  contamination  is  given  by  the  power  ratio  7'  =  which  generally  has  values 

between  100  and  10,000.  We  use  this  distribution  because  by  adjusting  the  parameters  we 
can  determine  the  performance  for  a  wide  range  of  distributions  with  different  properties 
such  as  tail  heaviness. 


The  nonlinearities  implemented  in  Page's  test  are  the  linear  detector  g{x)  =  i,  the  sign 
detector,  and  the  dead  zone  limiter.  They  are  shown  in  Figure  2  along  with  optimal  nonlin¬ 
earities,  the  log- likelihood  ratios,  for  siginal-to- noise  ratios  (SNR)  of  0  dB  and  -20  dB.  The 
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Figure  2.  (a)  The  three  nonlincarities  used  in  Page’s  test  and  the  log-likelihood  ratios  for 
Gaussian,  Laplace,  and  the  Gauss-Gauss  mixture  noise  (c  =  0.01,7^  =  1000)  for  an  SNR  of 
(b)  0  dB  and  (c)  -20  dB. 
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dead  zone  limiter  has  the  parameter  d,  which  can  be  optimized  for  the  particular  distribu¬ 
tion.  However,  since  we  assume  that  the  true  distribution  is  unknown,  this  optimization  can 
not  be  done  a  priori.  Instead,  we  use  the  optimal  d  which  maximizes  the  classical  detector 
efficacy  in  the  ordinary  binary  hypothesis  testing  problem  for  the  Gaussian  distribution 
regardless  of  the  true  distribution  so  that  d  —  0.612tT  [20].  Thus,  we  assume  that  the  noise 
variance  is  known  or  can  be  estimated. 

Figure  3  shows  the  performance  curves  of  the  three  detectors  for  signal-to-noise  ratios 
of  0  dB  and  —20  dB.  The  results  for  the  linear  detector  were  obtained  using  the  rectangular 
approximation  method.  For  the  relatively  large  SNR  of  0  dB,  the  performance  is  good 
in  nearly  all  situations  as  shown  in  3(a)  —  only  a  delay  of  a  few  samples  for  a  mean 
time  between  false  alarms  on  the  order  of  10,000  samples.  The  sign  detector  and  dead 
zone  limiter  perform  well  for  heavy-tailed  distributions  (the  Laplace  and  the  Gauss-Gauss 
mixture)  whereas  the  linear  detector  can  perform  poorly.  In  Gaussian  noise,  the  linear 
detector  outperforms  the  others.  These  results  are  analogous  with  those  obtained  in  the 
classical  detection  problem  [20,42].  ll’hen  the  SNR  is  —20  dB,  we  see  from  Figure  3(b)  that 
the  performance  is  considerably  degraded.  Now,  for  a  mean  time  between  false  alarms  of 
10,000  samples,  the  delay  is  on  the  order  100  to  200  samples.  The  performance  curves  czm 
be  explained  qualitatively  by  comparing  the  nonlinearities  implemented  with  the  optimal 
nonlinearity,  the  log-likelihood  ratio.  The  linear  detector  is  optimal  for  Gaussian  noise,  to 
within  an  inconsequential  scale  factor.  However,  for  the  Laplace  and  mixture  distributions, 
an  important  feature  of  the  optimal  nonlinearities  is  the  limiting  of  outliers,  a  feature  also 
present  in  the  sign  detector  and  the  dead  zone  limiter. 

There  is  an  important  observation  we  can  make  about  the  curves,  independent  of  the 
detector  and  the  underlying  noise  distribution.  A  common  feature  of  all  the  performance 
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1o'  10^  10^  10“  10®  10® 


Mean  Time  Between  False  Alarms  T  (samples) 
(a) 


Mean  Time  Between  False  Alarms  T  (samples) 
(b) 


Figure  3.  The  performance  curves  for  Page's  test  implemcnte<1  with  the  linear  detector, 
the  sign  detector,  and  the  dead  zone  limiter  operating  in  Gaussian,  Laplace,  and  Gauss- 
Gauss  mixture  noise  (c  =  0.01,7*  =  1000)  for  a  symmetric  additive  signal  in  noise  with  an 
SNR  of  (a)  0  dB  and  (b)  -20  dB. 
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curves  is  that  for  large  values  of  the  mean  time  between  feilse  alarms  T,  the  delay  D  appears 
to  be  a  linear  function  of  the  logar  ithm  of  T.  Moreover,  for  large  T,  the  ratio  log(T)/D  — 
which  is  the  reciprocal  of  the  slope  —  would  be  an  appropriate  performance  measure.  In 
the  following  section  we  clarify  these  statements. 


2.3  Asymptotic  Performance  Measure 


As  suggested  by  the  performance  curves  in  the  previous  section  we  define  an  asymptotic 
performance  measure  rj  for  Page’s  test  given  by 


A 

T]  = 


lim 
T— *00 


logT 

D 


=  lim 
Z)— oc 

=  lim 

ft— 'OO 


logT 

D 

logT 

D 


(12) 


where  the  natural  logarithm  is  used.  The  measure  Tj  is  the  reciprocal  of  the  asymptotic 
slope  of  the  performance  curve,  and  larger  values  of  rj  indicate  better  performance.  In  the 
remainder  of  this  section,  we  will  derive  a  lower  bound  for  this  measure  and  show  that  it 
has  some  interesting  and  useful  properties. 


At  this  point  we  review  a  result  due  to  Lorden  [25].  Let  L  be  the  stopping  variable  of 
a  one-sided  test,  that  is, 

L  =  inf{/  :  Zi  >  h}, 

for  some  statistics  {Z/}  which  are  functions  of  the  i.i.d.  observations  A],  A2, _  For  k  = 

1,2, . . .,  let  Ljt  be  the  stopping  variable  of  the  same  test  applied  to  A'jt.  , . . .  and  define 
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N  is  the  stopping  Vciriable  of  the  test  which  stops  when  the  first  of  the  tests  applied  to 
XicXk+i,. . .  for  k  =  1,2,...  stops. 


Theorem.  (Lorden  [25])  Let  L  be  such  that  P{L  <  oo|flo}  <  Then, 

£[Xl^o]  >  1/q, 

and  for  any  alternative  distribution, 

£[.V|0i]  <  E[L\e,]. 


This  theorem  is  relevant  because  Page’s  test  with  zero  initial  score  is  exactly  the  first 
Wald  one-sided  sequential  test  operating  on  . . .  for  k  =  1,2,...  to  stop  as  is 

shown  in  the  example  given  in  Figure  1(b).  This  can  be  seen  from  the  view  that  when  the 
test  statistic  would  have  fallen  below  zero  —  indicating  that  the  test  starting  from  k  +  I 
will  reach  the  boundary  first  —  S*.  is  reset  to  zero  indicating  that  a  new  test  is  starting 
from  +  1. 

To  use  the  theorem,  we  examine  the  Wald  one-sided  test  with  stopping  variable  L 
defined  by 

L  =  inf{m  ;  Sm  >  h],  (13) 

where 

m 

1=1 

The  probability  that  this  test  terminates,  that  is  P{L  <  oc},  is  the  probability  that  the 
boundary  h  is  exceeded.  Although  this  can  not  be  computed  exactly  for  arbitrary  jumps 
g{X,),  an  estimate  is  provided  by  tlie  so-called  Wald  approximations  and  bounds  [11,44]. 


First,  we  recall  some  results  on  the  non-zero  root  u;  =  u;(^)  of  the  moment  generating 


function  identity 


£’(exp{i^’<7(-Y)}|6']  =  1. 


The  existence  and  uniqueness  of  such  a  root  is  guaranteed  when  giX)  ha^  a  non-zero  mean 
and  satisfies  certain  other  conditions  [11,4-1].  For  our  purposes,  we  will  assume  that  a  unique 
root  exists.  A  key  property  of  this  root  is  that  if  E[g{X)\d]  >  0  then  the  root  is  negative 
whereas  if  ^(.Y)  has  a  negative  mean,  the  root  is  positive  [lOj.  VV^hen  we  are  trying  to 
detect  a  change  from  a  negative  trend  to  a  positive  one,  we  assume  that  FJ[p(A')|^o]  <  0  < 
F[5(X)10i],  implying  that  w(0i)  <  0  <  u;(0o)- 


In  order  to  compute  the  probabilities  and  expectations  required  in  the  theorem  for  the 
one-sided  test  (13),  we  consider  the  two-sided  Wald  test  with  boundaries  b  <  0  <  h  given 
by  the  stopping  variable  M , 


M  =  inf{m  :  Sm  <  b  ot  Sm  >  h],  (15) 

with  Sm  as  in  (14).  The  one-sided  test  is  now  the  limit  of  two-sided  tests  as  b  tends  to 
negative  infinity,  i.e., 

liin  M  =  L 

6— •— oo 

in  probability.  Define  the  operating  characteristic  V{6)  of  the  two-sided  test  according  to 

V{e)  =  p{Sm  <b]. 

The  following  bounds  for  the  OC  will  be  used  when  >  0  [11,44]: 

exp{/t^'(g)}  -  1  ^  S(9)exp{h^i$)}  -  1 

exp{/ia;(0)}  -  K(0)exp{6-.'(0)}  ~  ~  ^(0)  exp{/i-;(0)}  -  exp{6w’(0)} 
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where 


k(0)=  inf 
l<f<oo 


exp{u;(0)f/(.V)}lexp{u;(fl)5(A')}  < 


6{6)  =  sup 
0<{<1 


exp{>^-(6')5(A')}!exp{^(fl)5(A')}  > 


It  follows  that  k{9)  <  1  and  6{d)  >  1.  Also,  we  will  use  the  following  bounds  for  the  average 


sample  number  Ai{d)  =  of  the  two-sided  test  [11,44]: 


Mi9){ 


> 

< 


bv{e)  +  [h  +  'i(6)][i-v(e)] 

E[g{xm 

b-v(e)  +  ii^i  -  v(e)] 
E[gH.\)\e] 

bv(e)  +  [h  +  i(d)][i-v{e)] 
E[g(xm 


ifE  [5(X)|0]  <  0 
ifE[5(X)|0]  =  O 
ifE[p(A')10]>O 


(17) 


where 


7(0)  =  sup  £[i/{A')  -  T\g{X)  >  r;0]. 
r>0 


The  probability  that  the  one-sided  test  (13)  does  not  stop  under  Oq  is  given  by  the  limit 
as  b  tends  to  negative  infinity  of  the  probability  that  the  two-sided  test  (15)  terminates  at 
the  lower  boundary  6: 

P{L  =  oc|0o}  =  lini  £(00) 

6— — oo 

^  _ exp{h,^(0o)}  -  1 _ 

“  6--0O  exp{/iw’(0o)}  -  K(0o)exp{6u;(0o)} 

_  exp{/iw-(0o)}  -  1 
exp  {/»,*;(  00 )} 

Therefore,  the  probability  that  the  one-sided  test  terminates  is  upper  bounded  by 

P{L  <  ocI0o}  <  exp{-/iu.'(0o)}. 


An  upper  bound  for  the  average  sample  number  of  the  one-sided  test  £(0i)  can  be 
obtained  by  using  the  upper  bound  for  the  ASN  of  the  two-sided  test  .V1(0i)  and  taking 
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the  limit  as  b  tends  to  negative  infinity; 

£(0i)  =  £[I10i]  =  lim  M{6i) 
fc— — oo 

,  6^(gi)-H[fi  +  7(gl)][l-'P(gi)] 

The  right  hand  side  is  a  decreasing  function  of  V[9i)  so  that  the  inequality  is  preserved  if 
we  replace  V{0i)  by  zero  resulting  in 


Now,  Lorden’s  theorem  can  be  applied  to  yield  results  on  Page’s  test  with  threshold 
h,  increments  g(X),  and  zero  initial  score.  The  mean  time  between  false  alarms  T  is  lower 
bounded  by 

T  =  £[A'15o  =  0,00]  >  exp{hw(0o)}  (18) 


while  the  delay  D  is  upper  bounded  by 

P  =  £[.V|5„  =  0,..l<|^.  (.9) 

Using  these  results,  we  can  lower  bound  the  performance  measure 


logjgXp  {/lw’(0o)}] 

-  A"^[h  +  7(^l)]/£l5(A')|0i] 

=  ^-(0o)r[5(A)|0i] 


assuming  that  7(^1 )  is  finite.  (This  is  a  reasonable  assumption  and  is  obviously  satisfied 
when  5  is  a  bounded  function.) 


The  same  result  can  be  obtained  in  a  more  direct  manner  by  adopting  the  approach 
in  [30].  Using  equations  (5)  and  (G),  we  see  that  the  ASN  of  Page’s  test  is  the  ratio 


28 


of  the  average  sample  number  to  the  operating  characteristic  of  a  two-sided  Wald  test  with 
boundaries  0  and  h.  Now,  by  using  the  bounds  in  (16)  and  (17),  we  can  find  bounds  on 
directly  by  forming  the  desired  ratio  and  taking  the  limit  as  b  tends  to  zero.  We 
find  a  lower  bound  for  the  mean  time  between  false  alcirms  T  when  £[5(A')|0o]  <  0: 


T  =  A'(O,0o)  =  lini 


M{6q) 


i|0  1  -  V{6q) 


bV(9o)  +  [h  +  ^{9o)][l-V{eQ)] 

-  ifo  £[<7(A')ieo][l  -  nOo)] 

.  1  bV{9o)  h  +  i{9o) 

6to£[r/(A')|eo]l-n^o)  £[5(A')ieo]’ 


The  right  hand  side  is  a  decreasing  function  of  V{9o)  since  b  <  0.  Thus,  if  we  replace  V{9q) 
by  its  upper  bound  in  (16)  we  get 


T  >  Im  1>  (ji(go)exp{M^o)}  -  1  h  +  7(go) 

“  4|o  £[(7(A')10o]  l-exp{M^o)}  ■£^b(A')|5o]’ 

Both  the  numerator  and  denominator  abo%e  are  zero  when  6  =  0,  so  we  use  L'Hopital's  rule 
to  find 

T  >  1  1  -  <3(go)cxp{M^o)}  h  4-  7(^0) 

-  E[g(X)\9o]  u-(0o)  E[g{X)\9o] 

>  1  l-exp{Mgo)}  h  +  7(^0) 

-  £[(7(A')|^o]  -(^0)  E\3{X)\9o] 

where  the  last  step  follows  from  the  fact  that  ^(^o)  >  1-  If  •£^[5(A')l^o]  =  O'  we  use 
(17)  to  obtain 


T’  K'fn  c  \  r  A'f(^o) 


>  li^ 

-  m  £[i/-(A')ldo][l  -  n9o)] 

=  lini 

6TO  £[i7-’(A')|0o](1  -:P(0o)]  E\g'-(X)\9oy 


The  right  hand  side  is  an  increasing  function  of  'P(^o)  so  that  if  it  is  replaced  by  zero  the 
inequality  is  preserved; 

h^- 


The  importance  of  this  is  that  if  £'[£r(A!’)|^o]  =  0,  a  significant  penalty  is  paid.  The  mean  time 
between  false  alarms  is  now  a  quadratic  function  of  the  threshold  instead  of  an  exponential 
one. 


An  upper  bound  for  the  expected  delay  D  is  computed  in  the  same  manner  as  before 
using  (17)  where  now  £’[<7(A')(0i]  >  0: 

M(0l) 


£»  =  A/(0,^l)  =  lim 


610  1  -V{ei) 


<  lim 

6ro 


=  li 


E[g{X)mi-V{ei)] 
1 


6T0  £[5(A')10iir-m)  "  £[5(^)1«i]‘  . 

The  right  hand  side  is  decreasing  function  of  V(6i)  since  6  <  0.  Thus,  if  we  replace  T’(do) 
by  zero  the  ineoi:  iity  is  preserved  and  we  obtain 

n<i±2M. 

-  £l5(.V)|9,l' 

When  w(0o)  >  0  and  w(0i)  <  0,  we  have  that  as  h  oo 

T=A^(0,do)> 

£>=A'(0,fli)< 


bV{e,)  ^  /i  +  7(«l) 


_f,hw(9o) 


E[g(x)\9o]^i0o) 

h 


E[g{xm] 

so  that  the  performance  measure  is  lower  bounded  by 

^  ^  >  u;(0o)i^b(a;)l^l]-  (20) 

a— 00  JJ 

This  lower  bound  can  also  be  shown  to  be  an  approximation  to  the  actual  value  of  g  by 
repeating  the  derivations  above  using  the  Wald  appro.ximations  [11,44]  instead  of  the  bounds 
in  (16)  and  (17). 


The  bound  in  (20)  was  derived  for  £'[</(A')]0o]  <  0.  However,  if  £[5(A')|0o]  =  0.  we 
find  that  the  performance  measure  g  is  lower  bounded  by  zero.  This  demonstrates  that  is 
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worthwhile  to  bias  the  detector  when  it  is  known  that  before  the  disorder  occurs  the  detector 
output  will  have  zero  mean.  If  the  detector  is  biased  to  have  a  negative  mean  before  the 
disorder,  then  the  mean  time  between  false  alarms  will  be  an  exponential  function  of  the 
threshold  instead  of  only  a  quadratic  one.  Thus,  for  a  fixed  mean  time  between  false  alarms, 
the  delay  for  the  biased  detector  will  be  on  the  order  of  the  logarithm  of  the  delay  of  the 
unbiased  one.  It  should  be  noted,  however,  that  the  biased  detector  will  not  detect  changes 
in  mean,  which,  before  the  biasing,  are  less  than  the  bias  to  be  applied  so  that  care  must 
be  taken  in  biasing  the  test. 

Using  the  results  in  [21],  the  bound  on  the  performance  measure  can  easily  be  extended 
to  apply  to  the  so  called  “two-sided  Page’s  test”  in  which  the  change  in  distribution  is 
either  from  F{6q)  to  F{6i)  or  from  F{6i)  to  F{6q).  For  simplicity,  we  consider  only  the 
ordinary  version  of  Page’s  test  with  £[(7(A’’)16o]  <  0  <  £[(7(A')10i]. 


2.4  Properties  and  Examples  of  the  Performance  Measure 


Define  the  approximate  asymptotic  performance  measure  rj  by 

^  (21) 

which  is  also  a  lower  bound  to  the  true  value,  rj  has  the  following  desirable  properties. 

Property  1.  rj  is  invariant  to  scale  changes  in  g{x).  Let  g{x)  =  cg{x)  with  c  >  0.  Then 
Q{9o)  =  u!{6(,)Ic  and  =  c£'[£/(x)10i].  This  is  desirable  because  a  scale  change  can 

be  absorbed  into  the  threshold  and  will  not  affect  the  performance.  ■ 
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Property  2.  If  g  is  the  log- likelihood  ratio,  gix)  =  \og[dF{x\di)/dF{x\6o)],  then  the 
bound  is  tight,  i.e.,  g  =  tj.  Lorden  [25]  proved  that  for  the  log-likelihood  ratio  g  = 
where 


=  E  ^log 

OO 

=  /  log 

—  OO 


dF{X\h) 

dF{X\9^) 

dF{x\9,) 

dF{x\9Q) 


r^j 

dF{x\9^) 


which  is  called  the  discrimination  or  Kuliback-Leibler  divergence  in  information  theory. 
Since 


£ 


d^Xl9i) 

dF(Xl9o) 


dF(xl9i) 

dF(x\9o) 


dF(xl9o) 


=  1, 


it  follows  that  w(0o)  =  1-  So, 


rj  =  £[5(i)1^i] 

dF{X\9^) 


log 


dFiX\9o) 


=  1- 
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Property  3.  When  jf  is  a  random  walk  nonlinearity,  r)  =  tj:  We  can  find  77  explicitly  by 
using  the  closed  form  solution  for  the  average  sample  number  in  (10).  Under  the  assumption 
that  £[5(1)1^01  <  0  3iid  £[ff(^)|^i]  >  0,  we  have  that  q{9o)  >  p{9o)  while  p{9i)  >  g{9i). 
Thus,  the  mean  time  between  false  alarms  and  the  delay  are  asymptotically 


T  =  A^{0,9q) 
£)=A/'(0,fli) 


[qiOo)-p{W 

h 

p{9i)-  q(9i) 


as  h  — *  OO.  Thus,  tj  is  given  by 


T]  = 


lim 
/>— 00 


log(T) 

D 


[p{9i)-  ?((9i)]log 


9(^0) 

Pi&o)' 


(23) 


We  have  that 


+ 


This,  together  with 


yields  the  desired  result. 


u;(^o)  =  log- 


E{giXm]  =  piBi)-q{e,), 


Property  4.  rj  =  t)  for  the  Brownian  motion  analog  of  the  discrete-time  Gaussian 
ca.se  using  a  continuous-time  linear  Page's  test:  Here,  we  observe  the  Brownian  motion  Bt 
satisfying  the  stochastic  differential  equation 


(IBt  =  ptdi  +  i/tdii’i 


where  wt  is  a  Wiener  process,  pt  is  the  drift  given  by 


for  t  <  to 
\  ^1,  for  t  >  to 


with  ^0  <  0  <  ^1,  and  i/t  is  given  by 


_  r  ctq,  for  t  <  to 

\  <7i,  for  t  >  to- 


The  continuous-time  linear  Page  test  is  given  by  the  stopping  variable  U  given  by 


U  =  inf{u  :  Cu  >  h] 


where  Cu  is  related  to  the  observations  by 


Cu  —  Bu  ~  min  5(. 

Q<i<u 
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Property  5.  In  [16],  Ilinklcy  derived  an  asymptotic  performance  measure  for  the  nonlin¬ 
earity  g  in  an  off-line  formulation  of  the  disorder  problem  using  the  probability  of  detection 
and  probability  of  false  alarm  as  the  performance  criteria.  That  measure  H  is  given  by 
H  =  log  £'[exp{u;(0o)<7(A')10i}]  whereas  Tj  can  be  written  as  77  =  £  (log[exp{w)(0o)p{  A')|^i}]). 
By  Jensen’s  inequality  //  >  ^.  ■ 

Using  TJ,  the  bound  and  approximate  value  of  the  performance  measure,  we  can  compare 
the  performajice  of  various  nonlinearities  in  different  noise  distributions.  Thus,  we  have  an 
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analytic  measure  and  can  avoid  the  complicated  numerical  integration  discussed  in  Section 
2.2  to  generate  the  performance  curves.  Figure  4  shows  the  performance  measure  77  as  a 
function  of  SNR  for  the  linear  detector,  the  sign  detector,  the  dead-zone  limiter  {d  =  0.612cr), 
and  the  log-likelihood  ratio  for  Gaussian  noise,  Laplace  noise,  and  Gauss-Gauss  mixture 
noise  with  c  =  0.01  and  7“  =  1000.  (Note:  the  results  for  the  log-likelihood  ratio  for  the 
Gauss-Gauss  mixture  could  not  be  obtained  numerically).  In  Figure  4(b)  we  see  that  the 
performance  of  the  sign  detector  approaches  that  of  the  optimal  detector  for  small  SNR’s. 
This  is  reminiscent  of  the  result  from  clcLssical  detection  theory  that  the  sign  detector  is 
the  locally  optimal  detector  for  Laplace  noise.  This  will  be  examined  in  more  detail  in  the 
next  section.  Figure  4(c)  illustrates  that  the  linear  detector  can  perform  very  poorly  for 
the  Gauss-Gauss  mixture  where  for  an  SNR  of  0  dB  the  delay  for  the  linear  detector  will 
be  more  than  ten  times  larger  than  the  delay  for  the  sign  detector. 

The  performance  measure  of  the  three  detectors  for  the  Gauss-Gauss  mixture  noise  is 
shown  as  a  function  of  c  for  7*  =  100  and  1000  in  Figure  5.  We  see  that  the  sign  detector 
and  dead  zone  limiter  can  perform  much  better  than  the  linear  detector  even  though  the 
linear  detector  is  the  optimal  detector  for  ordinary  Gaussian  noise.  We  see  from  Figures 
5(a)  and  5(b)  that  the  delay  using  the  linear  detector  can  be  100  times  larger  than  the  delay 
using  the  sign  detector  for  certain  mixture  distributions. 

The  performance  measure  r)  can  be  interpreted  in  two  ways.  We  can  use  the  ratio  of  two 
performance  measures  to  indicate  the  relative  delays  for  the  two  situations  for  a  fixed  mean 
time  between  false  alarms  large  enough.  For  example,  for  small  SNR’s,  Figure  4(b)  indicates 
that  using  the  linear  detector  in  Laplace  noise  would  result  in  twice  the  delay  that  the  sign 
detector  would  have  for  a  large  mean  time  between  false  alarms.  Alternatively,  we  can  use 
rj  to  convert  a  large  mean  time  between  false  alarms  to  the  corresponding  delay  using  the 
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Figure  4.  The  performance  measure  7  as  a  function  of  the  signai-to-noise  ratio  for  Page’s 
test  implemented  with  the  log-likelihood  ratio,  the  linear  detector,  the  sign  detector,  or  the 
dead  zone  limiter  operating  in  (a)  Gaussian  noise,  (b)  Laplace  noise,  and  (c)  Gauss-Gauss 
mixture  noise  wdth  t  =  0.01  and  7'  =  1000. 
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definition  in  (11).  To  demonstrate  how  one  might  use  this  interpretation  of  the  performance 
measure,  consider  the  following  situation.  Suppose  we  are  observing  the  symmetric  additive 
signal  in  noise  disorder  problem  where  the  noise  distribution  is  unknown  but  is  assumed  to 
be  a  Gauss-Gauss  mixture  with  contamination  parameter  f  in  the  range  (0.05,0.25)  and  7 
in  the  range  (100, 1000).  The  SNR  is  known  to  be  0  dB.  Independent  samples  are  taken  at 
a  rate  of  10  kHz.  It  is  desired  to  have  less  than  one  false  alarm  per  hour,  or  less  than  one 
false  alarm  every  3.6  x  10^  samples.  We  note  from  Figure  5(a)  that  ^  w  3  in  the  range 
of  interest.  The  expected  delay  is  upper  bounded  by  log(T)/p  which,  for  this  example,  is 
less  than  six  samples  or  600  microseconds.  Similarly,  for  a  false  alarm  rate  of  one  per  ten 
hours,  the  expected  delay  is  less  than  seven  samples  or  700  microseconds. 


2.5  Local  Asymptotic  Performance 


The  performance  mecisure  5;  is  a  valid  lower  bound  for  all  values  of  the  parameters  ffg 
and  01.  Therefore,  we  investigate  its  properties  in  the  so-called  locaJ  or  weak  signal  case. 
In  addition  to  when  the  observations  justify  a  weak  signal  treatment,  the  local  performance 
is  also  important  from  a  nonparamelric  viewpoint.  Since  we  may  not  know  the  true  signal 
value  in  a  nonparametric  setting,  it  is  hoped  that  by  using  a  good  detector  for  the  weak  signal 
situation  —  which  will  not  depend  on  the  signal  value  —  we  can  achieve  good  performance 
in  the  non-local  situation  as  well. 

Specifically,  we  consider  the  symmetric  additive  signal  in  noise  situation,  i.e.,  F'(i|flo)  = 
Fo(x  +  0)  and  T(x|flj)  =  Fo(x  -  0)  whorp  Fq  is  some  probability  distribution  and  0  is 
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a  positive  parameter.  Let  y  be  a  random  variable  with  distribution  Fq  which  admits 
the  probability  density  function  fo(x).  In  addition  to  assuming  that  E[g{Y  -  d)]  <  0  < 
E\g{Y  +  ^)]  for  all  ^  in  a  neighborhood  of  zero,  it  is  assumed  that  E[g{Y  ±0)]  is  continuous 
at  0  =  0  so  that  ^[(/(y')]  =  0.  This  implies  that 


lim  Tj  =  lim  i^{-6)E[g{Y  +  B)]  =  0. 

This  w'ould  be  the  case,  for  example,  if  /o  is  an  even  function  and  p  is  tin  odd  function. 
Thus,  the  local  performance  of  Page’s  test  is  determined  by  the  first  non-zero  derivative 
with  respect  to  6  of  g.  Below,  we  derive  the  following  key  result  under  general  smoothness 
and  boundedness  constraints: 


lim  —T}  -  0 
0  (19 


(P 


9-Q  d9-  '£^['7'(y)] 


(25) 

(26) 


Here,  €  is  the  efficacy  of  the  detector  g  in  the  classical  detection  problem.  In  the  derivation, 
we  will  assume  that  limits  and  expectations  can  be  interchanged  freely,  and  similarly  for 
derivatives  and  expectations.  For  a  par  ticular  function  g  and  distribution  Fq  these  steps  can 
be  verified  using  the  Lebesgue  Dominated  Convergence  Theorem  (see  [16]  for  an  example 
where  Fq  is  in  an  exponential  family).  We  will  further  aissume  that  all  random  variables 
considered  are  integrable,  that  is,  the  expectation  exists  and  is  finite.  Again,  whether  this 
is  true  in  a  particular  instance  would  need  to  be  verified. 

In  deriving  (25)  and  (26),  we  consider  two  cases:  when  g{x)  is  the  log-likelihood  ratio 
and  when  g{x)  is  a  general  nonlinearity  which  does  not  depend  on  6.  In  the  former  case, 
we  have  from  (22) 

=  £(.(r +«,)  =  £  (log  (2-) 


•to 


which  is  zero  for  6  =  0.  Under  suitable  regularity  conditions,  (27)  can  be  differentiated  with 
respect  to  6  yielding 


lim  ~Tj  =  lim  E 

8^0  dd  «_o 


-2/'(y  +  20) 


=  -  y  2/o(y)t/j/, 


and  this  is  zero  because  it  is  the  derivative  with  respect  to  s  at  s  =  0  of  the  identity 
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J  foiv  +  s)dy  =  1. 
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Similarly,  differentiating  (27)  twice  with  respect  to  9  and  taking  the  limit  yields 

oo  oo  r  2 

dy. 


Jim  ^^  =  -4  J  fQiy)dy  +  4  J  /o(y) 


/o(y)l 


fo{y) 


— oc  — oo  ^ 

The  first  term  can  be  seen  to  be  zero  by  differentiating  the  identity  (28)  twice  with  respect 
to  s  and  setting  s  =  0,  while  the  second  term  is  four  times  the  efficacy  of  the  log-likelihood 
ratio,  which  coincides  with  Fisher's  Information  for  the  distribution  Fq. 


When  g(x)  is  a  general  nonlinearity  which  docs  not  depend  on  9,  it  is  assumed  that 
u{-6)  —♦  0  as  0  —  0.  This  is  reasonable  since  the  moment  generating  function  A/(<)  = 
£'[exp{t5(y  -  0)}]  is  a  convex  function  of  t  with  M[t  =  0)  =  1  and  whose  slope  at  /  =  0 
is  equal  to  E{g{Y  —  0)},  which  we  assume  increases  to  zero  as  9  increases  to  zero.  Since 
u;{—9)  — *  0,  the  local  performance  of  Page’s  detector  is  determined  by  the  first  non-zero 
derivative  of  ^  with  respect  to  0  at  0  =  0. 

We  have  that  q  =  u.’{-6)E{g{Y  +  0)}  which  is  differentiated  to  yield,  under  suitable 
regularity  conditions, 

=  E{g{Y  +  0)}-^u^{-9)  +  ^{-9)E[g'{Y  +  9)].  (29) 


Since  £'[y(y)]  =  0  and  limg|o‘^’(~^)  =  0^ 


Differentiating  (29)  again  yields 


=  ^[A-0m9{Y  +  e)]  +  ±[u;i-e)]E[g'{Y  +  0)] 

+  ±[uf^-0)]Ey{Y  +  e)]+u{-0)E[g"{Y  +  0)] 

Therefore,  we  have  that 

=  21im— [u;(-«)]£[ff'(r)].  (30) 

«io  de-  eio  de'-  '  '’j 

Next,  the  identity  1  =  £  [exp{u;(-^)f7(y  —  ^)}]  is  differentiated  twice  with  respect  to  0  and 
the  limit  is  taken,  resulting  in 

0  =  -  2£Wy))  im, 


This  is  solved  yielding 


lim  -^u;(-0)  =  2 

e-^o  d0 


E{g'(Y)} 


or  lim  — u-'(-fl)  =  0. 

fl-.0  d0  ^  ’ 


(31) 


When  the  first  relationship  in  (31)  holds,  (26)  follows  from  (30). 


The  validity  of  these  results  is  demonstrated  in  the  following  examples. 

Example  1.  When  Page’s  test  is  implemented  with  the  dead  zone  limiter,  g  is  given  by 

(23) 

1  =  V  =  lp(6'i )  -  9(^1 )]  log  ^1^. 

We  also  assumed  unbiasedness  of  the  detector  in  the  noise-only  situation  implying  that 
p(0)  -  g(0)  =  0.  Since  p{0)  and  q{9)  are  given  by 

p{0)  =  I  -  Fo{d  -  0) 

q{9)  =  Foi-d  -  0) 

the  unbiasedness  condition  implies 

Fo{-d)  =  1  -  £o(d) 
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where  [-d,  d)  is  the  dead  zone.  Computing  yields 


,  ri  I-/J  /1\  I-  /  J  /1,1  r/o(-^  +  ^)  ,  foid+6) 

+  [1  -  f„(i  - «)  -  Fo(  -d  -  e,i 


which  is  zero  for  ^  =  0  because  of  the  unbiasedness  condition.  Computing  the  second 
derivative  of  rj  results  in 
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For  0  —  0,  this  reduces  to 


d6~  (e=o 


2[/o(d)H-/o(-d)]^ 
1  -  ^0(0") 


When  the  noise  has  a  symmetric  density,  this  becomes 


iir,  - 

d6~  e=Q  1  “  F’o(d) 


which  is  four  times  the  efficacy  [20]. 


Example  2.  For  Page’s  tost  implemented  with  the  linearity  g{x)  =  x,  rj  \s  given  by 
(24),  rj  =  26~f<T~.  The  first  derivative  is  ^77  =  49/a‘  which  is  zero  at  ^  =  0.  The  second 
derivative  is  a  constant,  =  4/(7",  which  is  four  times  the  efficacy  of  the  linear  detector 
in  Gaussian  noise  [19].  | 

The  fact  that  the  performance  measure  is  zero  when  there  is  no  signal  is  expected  and 
this  implies  that  performance  is  poor  for  signal  values  near  zero.  The  significance  of  the 


fact  that  the  first  derivative  of  the  performance  measure  is  elso  zero  at  0  =  0  is  that  the 
rate  of  improvement  in  performance  is  also  poor  for  small  signal  levels.  The  importance  of 
the  second  result  relating  the  second  derivative  of  the  performance  measure  at  ^  =  0  to  the 
efficacy  is  that  the  wealth  of  results  on  the  performance  of  detectors  with  cumulative  sum 
test  statistics  in  the  classical  hypotheses  testing  problem  for  detecting  a  weak  additive  signal 
in  noise  can  be  transfered  to  the  weak  signal  performcince  of  Page’s  test.  For  example,  it  is 
known  that  the  so-called  locally  optimal  nonlinearity  g{x)  =  foix)  maximizes  the 

efficacy  in  the  classical  detection  problem.  Therefore,  it  also  maximizes  the  performance  of 
Page’s  test  in  the  small  signal  case. 


2.6  Conclusion 

In  this  chapter  we  have  described  a  method  to  obtain  the  performance  curves  for  Page’s 
test  with  arbitrary  nonlinearities.  Perhaps  more  importantly,  an  asymptotic  measure  with  a 
simple  analytic  formula  was  derived  which  can  be  useful  in  comparing  different  nonlinearities 
for  a  variety  of  noises  in  the  desirable  situation  of  a  large  mean  time  between  false  alarms. 
By  using  the  Gaussian-Gaussian  mixture  family  it  was  shown  that  the  sign-based  Page’s 
test  can  outperform  the  linear  Page’s  test  for  a  wide  range  of  distributions,  which  is  a 
robustness  property  of  value  in  nonparametric  situations.  Finally,  by  examining  the  local 
performance  of  Page’s  test,  it  was  found  that  the  performance  is  directly  related  to  the 
efficacy  of  the  classiccd  binary  hypothesis  testing  situation.  Thus,  the  wealth  of  results 
on  using  memoryless  nonlinearities  in  classical  detection  theory  can  be  transferred  to  the 
quickest  detection  problem  using  Page’s  test. 
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3 


The  Application  of  Page’s  Test 

to 

Transient  Signal  Detection 


3.1  Introduction 


The  problem  of  detecting  unknown  transient  signals  is  an  especially  difficult  one.  The 
complications  arise  because  of  the  presence  of  several  unknown  parameters  which  force  the 
use  of  suboptimal  detectors.  These  unknowns  include  the  transient  signal  waveform  and  its 
amplitude,  the  duration  of  the  transient,  and  the  arrival  time. 

Several  approaches  have  been  taken  for  this  problem.  In  [23],  Wolcin  derives  a  detector, 
robust  for  the  transient  signal  waveform,  and  implements  it  sequentially  to  allow  for  arbi¬ 
trary  arrival  times  Friedlander  and  Porat  [7]  use  the  Gabor  representation  to  decompose 
a  received  waveform  into  a  two-dimensional  set  of  coefficients,  one  coordinate  representing 
the  complex  frequency  of  a  decaying  sinusoid  and  the  other  representing  the  arrival  time  of 
the  same  signal.  This  approach  is  investigated  in  Chapter  4.  Other  work  in  transient  signal 
detection  and  estimation  can  be  found  in  [8,11,15,18],  and  an  extensive  bibliography  can  be 
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found  in  [12]. 


In  this  chapter,  we  consider  a  new  approach  which  uses  Page’s  test  to  detect  transients. 
Although  this  test  has  been  used  in  the  related  area  of  signal  segmentation  in  conjunction 
with  speech  signals  [2],  the  majority  of  the  literature  evaluates  the  test  in  the  context  of  the 
disorder  problem  [3,5,17]  as  was  done  in  Chapter  2.  A  disorder  differs  from  a  transient  in 
that  once  the  disorder  occurs  the  change  remains  for  all  time,  whereas  a  transient  is  usually 
thought  of  as  having  a  finite  duration.  In  the  disorder  problem,  it  has  been  shown  that 
Page’s  test  implementing  the  log-likelihood  ratio  nonlinearity  is  the  quickest  detector  of  the 
disorder  [13.16].  Specifically,  for  a  lower  bounded  mean  time  between  false  alarms,  Page’s 
test  minimizes  the  worst  case  expected  delay  in  detection  once  the  disorder  occurs. 

The  rationale  for  using  Page’s  test  to  detect  transient  signals  follows  from  the  fact  that  it 
is  the  quickest  detector  for  the  disorder  problem.  Since  the  test  detects  the  change  as  quickly 
as  possible,  once  the  detection  occurs  it  is  irrelevant  that  the  disorder  continues  forever  — 
it  could  just  as  well  have  been  a  transient  that  ended  at  the  instant  after  the  detection. 
Conversely,  if  another  detector  performed  better  for  transients,  one  would  suspect  that  it 
would  be  quicker  than  Page’s  test  for  the  disorder  problem,  as  well.  However,  one  should  be 
cautioned  not  to  deduce  that  the  optimality  of  Page’s  test  for  the  disorder  problem  carries 
over  to  the  transient  problem.  This  can  be  understood  by  considering  those  realizations 
in  the  disorder  problem  which  correspond  to  delays  in  detection  larger  than  the  duration 
of  the  transient.  In  such  situations,  the  disorder  is  detected  with  some  delay  whereas  the 
transient  is  not  detected  a^  all. 

Another  factor  to  consider  in  the  question  of  the  optimality  of  Page's  test  for  transient 
signal  detection  is  the  problem  of  unknown  parameters.  Page’s  test  is  known  to  be  the 
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quickest  detector  when  the  log-likelihood  ratio  is  implemented  in  the  test  and  the  obser¬ 
vations  are  independent  and  identically  distributed  (i.i.d.)  with  one  distribution  before  the 
disorder  and  a  second  distribution  after  the  disorder  [13,16].  However,  in  the  transient 
signal  case,  the  i.i.d.  assumptions  may  not  hold,  and  even  if  they  did,  the  distribution  of 
the  transient  samples  including  all  the  parameters  such  as  amplitude  will  not  be  known  a 
priori.  Thus,  a  suboptimaJ  Page’s  test  would  have  to  be  implemented,  and  other  detection 
schemes  might  perform  better.  Despite  these  concerns  about  Page’s  test,  the  test  does  seem 
useful  for  detecting  a  change  which  occurs  at  an  unknown  time.  Thus,  the  evaluation  of  the 
performance  of  Page’s  test  for  transient  signal  detection  will  be  one  of  the  major  focuses  of 
this  chapter.  In  order  to  judge  the  performance  of  Page’s  test,  several  other  tests  found  in 
the  literature  are  also  evaluated  for  the  detection  of  transients.  Previously,  these  tests  had 
not  been  evaluated  for  transient  signal  detection  in  a  sequential  data  processing  framework. 

The  chapter  is  organized  as  follows.  The  use  of  Page’s  test  for  transient  signal  detection 
is  justified  in  Section  2  by  deriving  a  sequential  algorithm  to  determine  the  maximum 
likelihood  estimates  of  the  starting  and  ending  times  of  transients.  Approximate  methods 
of  evaluating  the  performance  of  Page’s  test  for  transient  signals  are  presented  in  Section 
3.  In  Section  4  the  performance  is  evaluated  for  the  specific  case  of  detecting  a  transient 
signal  from  observations  of  the  sample  energy  spectral  density.  Alternatives  to  Page’s  test 
are  presented  in  Section  5.  Calculated  and  simulated  results  are  discussed  in  Section  6  and 
conclusions  are  presented  in  Section  7. 
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3.2  Maximum  Likelihood  Estimation  for  Transients 


In  Chapter  2,  Page’s  test  was  shown  to  be  a  sequential  implementation  of  a  maximum 
likelihood  detector  under  the  assumption  that  the  change  in  the  underlying  process  can 
be  modeled  as  a  disorder  process.  However,  Page’s  test  has  only  been  considered  when 
it  is  assumed  that  once  the  disorder  occurs,  the  new  probabilistic  model  remains  for  all 
time  thereafter  [eg.  1,5,17].  This  is  obviously  not  the  case  for  transients,  for  although  they 
occur  abruptly  at  an  unknown  time,  they  also  disappear  (or  fade  away)  at  some  later  time. 
By  deriving  the  maximum  likelihood  estimates  for  the  starting  and  ending  times  of  the 
transient,  we  will  show  that  the  resulting  detector  also  implements  Page’s  test. 


We  assume  the  simplest  possible  model  for  the  transient  signal  in  noise.  Let  the  tran¬ 
sient  begin  at  time  to  and  end  at  time  <i.  Before  to  and  after  ti,  the  noise  is  modeled 
by  an  i.i.d.  random  process  with  distribution  Fq.  During  the  transient,  the  signal  together 
with  the  noise  is  modeled  by  an  i.i.d.  random  sequence  with  distribution  F].  Thus,  if 

A'l ,  A'2, . . .  1  A'jv  is  the  fixed  set  of  observations,  then  the  situation  is  as  follows: 

A'„  ~  i.i.d.  Fq,  1  <  n  <  <0 

X„  ~  i.i.d.  Fi,  <0  ^  n  <  /] 

Xn  ~  i.i.d.  Fq,  ti  <  n  <  N 

The  maximum  likelihood  estimates  of  tg  and  tj  are  given  by 


{<0-1  «i  N 

log/o(A'„)4-  !og/i(A'„)  +  log/o'A 

n=l  n=to  n=ti-|-l 

=  arg  max  |  log  77^  +  H  \ogfo{X, 

l„tro  n=l 


^n)|  • 


„)) 


/o(.Y„) 

We  now  consider  the  maximization  of  the  likelihood  function  as  indicated  above,  that  is 

(  h 


max 


H  log 


/l(A'„) 


(1) 
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ignoring  the  term  which  does  not  depend  on  the  transient  epochs.  In  the  maximization  of 
(1),  we  include  the  case  that  the  sum  contains  no  terms,  which  corresponds  to  the  situation 
in  which  no  transient  is  present.  For  this  case,  the  sum  is  by  definition  equal  to  zero,  and, 
therefore,  the  maximum  in  (1)  is  nonnegative.  Thus,  we  compute 

+  ■ 


max 


MX.) 


where  (•)"*■  =  max{0,  •}.  We  Hprive  a  sequential  algorithm  for  performing  the  maximization 
and  show  that  it  is,  indeed,  an  implementation  of  Page’s  test.  Assume  the  maximum  has 
been  computed  for  N  =  n,  and  let  7„  be  that  maximum  which  is  given  by 

+  '1 


7n  =  max  < 
l<«l<n 


.  ^  ,  hiXm) 

max  >  log  ■  — - 


The  maximum  for  A’  =  n  +  1,  7„+i,  can  be  computed  from 

+  '1 


7„+i  =  max 

l<tl<n  or  <i=n+l 


max  I  T 


(2) 


=  max  < 

[  max 

max  I 

Let  Sji  cLiid  Afi 

be  given  by 

Sn  = 

S  log 


<m=<o 


fliXrr,) 

/o(A'm) 


n  +  1 


,  max 
l<<0<n  +  l 


H  log 


^Tn=to 


/l(A'n) 

/o{  An  ) 


+  '1 


l<<0<r 


\m=to 


fliXm) 

MXm) 


A  /,(A'„) 


SO  that 


From  (2)  it  is  clear  that 


^n  +  l  —  (^n  +  An)"*”  . 


7n+l  —  fnax  {7n»  ^n  +  1  } 
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so  that  the  sequential  algorithm  for  computing  the  maximum  of  the  likelihood  function  is; 


^0  =  0 


70  =  0 


K  =  log 


fl(Xn) 

MXn) 


6n  =  max{0,6„_i  +  A„} 
7„  =  max{7„_i,6„}. 


n  =  1,2,. ..,N 


The  statistic  7„  is  an  indication  of  the  likelihood  of  a  transient  signal  of  some  length  in  the 
interval  [l,n]  —  the  larger  the  statistic,  the  more  likely  a  transient  is  present.  A  rezisonable 
detection  scheme  would  then  be 


7n 


> 

< 


h 


Declare  Transient  Signal  Present 
Continue,  n  =  n  +  1. 


(3) 


Note  that  is  identical  to  Page’s  statistic  Sn  in  (2.3),  and  that  7„  can  be  rewritten 


7„  =  max{io,^l.---,^n}  • 

This  implies  that  the  test  (3)  is  equivalent  to  Page’s  test  (2.4). 


By  not  integrating  over  the  observations  which  have  a  greater  probability  of  containing 
only  noise  than  noise  plus  signal.  Page’s  test  has  desirable  “windowing”  properties  which  are 
suited  for  detection  problems  when  events  occur  at  unknown  times.  The  effect  of  resetting 
Page’s  statistic  to  zero  instead  of  allowing  it  to  fall  below  zero  is  to  forget  adl  of  the  past 
observations.  In  such  a  situation  the  past  is  not  relevant  because  it  becomes  more  likely 
that  a  transient  has  not  yet  occurred.  This  resetting  has  the  effect  of  windowing  the  data 
so  that  the  noise-only  observations  before  the  transient  occur  are  ignored.  A  detection  will 
then  occur  if  the  signal  duration  and/or  signal-to-noise  ratio  are  large  enough  so  that  the 
test  statistic  exceeds  the  threshold. 
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3.3  Performance 


We  would  like  to  determine  the  performance  of  the  proposed  test.  To  do  so,  we  be¬ 
gin  by  evaluating  the  performance  for  the  disorder  problem  where  the  quickest  detection 
performance  criteria  are  relevant.  These  criteria  are  the  mean  time  between  false  alarms 
and  the  expected  delay.  Then,  using  these  results,  we  relate  performance  in  the  disorder 
problem  to  performance  in  the  transient  problem.  The  mean  time  between  false  alarms 
is  easily  related  to  the  false  alarm  rate,  but  the  same  can  not  be  said  for  the  expected 
delay  and  the  probability  of  detection.  Several  approaches  will  be  given  to  determine  the 
probability  of  detection  and  relate  it  to  the  performance  measures  developed  for  quickest 
detection  procedures. 


For  convenience,  we  parameterize  the  distributions  according  to  Fq  =  F(6q)  and  F\  = 
From  Chapter  2,  the  mean  time  between  false  alarms  T  can  be  lower  bounded  by 
the  Lorden  bound  (2.18) 

r  >  exp{u;(^o)/i}  (4) 

where  u;(0o)  the  unique  non-zero  root  (which  is  assumed  to  exist)  of  the  moment  gener¬ 
ating  function  identity 

£[exp{w5r(.Y)}|0]  =  1  (5) 


for  the  nonlinearity  g.  Also,  using  the  Wald  approximations  we  have  that 


T  a 


i:[i/(-V)|«o] 


1  -exp{/iu;(0o)} 


u;(^o) 


+  h 


(6) 


The  fadse  alarm  rate  o  is  related  to  the  mean  time  between  false  alarms  by  the  straightfor¬ 
ward  relation 


1 

“  ~  r 


(7) 
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which  has  units  “false  alarms  per  sample  A',”  (  note;  Xi  can  represent  one  random  variable 
or  a  vector  of  random  variables).  Because  we  use  a  one-sided  sequential  algorithm,  we  avoid 
the  confusing  terminology  “the  probability  of  false  alarm”  which  is  used  for  fixed  sample 
size  tests.  Indeed,  in  a  sequential  test  such  as  ours  the  probability  of  false  alarm  is  exactly 
one  because,  if  no  disorder  occurs,  the  test  will  terminate  at  the  boundary  h  if  one  waits 
long  enough.  The  critical  information,  however,  is  the  mean  time  between  false  alarms  or 
equivalently  the  false  alarm  rate.  The  worst  case  expected  delay  can  be  upper  bounded  by 


(2.19) 


Elgixm’ 


(8) 


or,  by  ignoring  the  excess  over  the  boundary,  we  have  that 


£b(A')|^i]- 


(9) 


In  Chapter  2  it  was  shown  that  an  appropriate  performance  measure  for  Page’s  test  is  the 
reciprocal  of  the  asymptotic  slope  of  the  performance  curve  indicating  D  versus  log(T)  as 
r  — •  00.  This  measure,  q,  is  lower  bounded  and  approximated  by  rj  given  by  (2.21) 


rj  =  w(5o)1^[5(-Y)|^i]  <  q  = 


lim 
T— *00 


log(r) 

D 


(10) 


The  expected  delay  and  the  performance  measure  are  valuable  criteria  for  the  disorder 
problem,  but  how  can  they  be  related  to  the  probability  of  detection  for  the  transient 
problem?  Indeed,  in  the  disorder  problem  the  probability  of  detection  is  exactly  one  because 
it  would  eventually  exceed  the  threshold  if  allowed  to  run  long  enough.  This  is  obviously 
not  meaningful  in  the  transient  situation  where  a  change  occurs  only  for  a  short  time. 

Some  simple  bounds  on  the  probability  of  detection  of  the  transient  can  be  obtained 
by  applying  Markov’s  inequality  on  the  positive  integer- valued  random  variable  L.  As 
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mentioned  above,  the  worst  case  expected  delay  can  be  calculated  by 


Z)  =  £[11^1,50  =  0]. 


(11) 


Since 

(n  +  l)^{I>n)  ^ 

where  is  the  indicator  random  variable  of  the  measurable  set  >1,  it  follows  that 

Let  T  be  the  integer- valued  duration  of  the  transient  and  let  /?  be  the  probability  of  detecting 
the  transient.  A  detection  will  be  defined  as  a  test  which  terminates  during  the  life  of  the 
transient.  Thus,  a  test  which  terminates  on  the  sample  after  the  transient  is  over  will  be 
considered  a  false  alarm  and  not  a  detection.  This  is  done  to  avoid  ambiguities  and  the 
question  of  until  how  long  after  the  transient  has  ceased  should  “detections”  be  attributed 
to  the  transient.  The  probability  of  detection  is  a  function  of  the  state  of  the  cumulative 
sum  statistic  at  the  moment  before  the  transient  occurs  so  that 

0  =  0(z)  =  V{L  <  rl^i,  5o  =  ^  >  0). 

0  is  lower  bounded  by  the  worst  case  scenario  in  which  5o  =  0  so  that  (11)  can  be  used  to 
find 


0  >  V{L  <  rl^i,5o  =  0} 
> 

“  r  +  l  ■ 


(12) 


This  formula  is  useful  if  the  expected  delay  is  small  in  comparison  with  the  transient  du¬ 
ration,  and  it  also  illustrates  the  obvious  fact  that  for  the  disorder  problem,  where  r  =  oc, 
the  probability  of  detection  is  one  when  the  expected  delay  D  is  finite.  Using  (8)  and  (10), 


which  can  be  substituted  into  (12)  to  lower  bound  the  receiver  operating  characteristic,  or, 


for  large  D, 


0> 


r+  1  + 


loga 

V 


r-l-  1 


(13) 


When  the  expected  delay  is  large,  asymptotic  results  on  the  distribution  of  the  run 
length  can  be  used  to  obtain  an  approximation  to  the  probability  of  detection.  Set  /x  = 
E\g{X)\6i\  and  <7^  =  Var[5(A’)|fli].  The  asymptotic  distribution  of  the  run  length  under 
F{6\)  is  as  follows  [lO] 

as  h  — ►  00,  where  N{a,b)  is  the  Gaussian  distribution  with  mean  a  and  variance  b.  Using 
the  asymptotic  distribution,  the  probability  of  detection  can  be  approximated  by 

0  =  V{L  <t}^  ^  I  I  (14) 

where  <^(-)  is  the  Gaussian  cumulative  distribution  function, 

^OO 

As  done  above,  substitutions  can  be  made  for  h  and  n  with  a,  D,  and  t)  to  obtain  additional 
relationships  from  (14). 

The  most  useful  approximation  can  be  obtained  by  using  the  well  known  result  [6]  that 
5n  has  the  same  distribution  as  A/„  where 

Mn  =  max{0,Si,S2,...,5„}, 

and  where  5„  and  Sj ,  52, . . . ,  5„  are  related  through  (2.1)  and  (2.3).  Again,  we  assume  the 
worst  case  formulation  and  shift  the  time  axis  so  that  the  transient  occurs  at  time  n  =  1 
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and  that  5o  =  5o  =  0.  Then  the  actual  probability  of  detection  is  lower  bounded  by 

0  >V  |s„  >  h  for  some  n  =  1,2, . . .  ,r| 


>  P{Sr  >  h} 

=  V{Mr  >  h} 

=  >  h  for  some  n  =  1, 2, . . . ,  r} 

(15) 

>  V{Sr  >  h}. 

(16) 

Another  approximation  can  be  obtained  by  modeling  the  cumulative  sum  statistic  by 
a  Brownian  motion  in  continuous  time.  Let  VV^(t)  be  a  Wiener  process,  h{t)  =  a  +  bt  for 
a  >  0  and  6  <  0,  and  define  a  stopping  variable  T{h)  by 

T{h)  =  inf{t  >  0lW{t)  >  h{t)]. 

The  distribution  of  T{h)  is  known  to  be  [13] 

P{T(A)<  exp 

n  ^  ' 
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Using  (15),  we  have  that 


0  >  V{f{h)  <  r). 


where  T{h)  is  defined  by 


f{h)  =  inf{<  >  0|5n  >  h). 


Now,  since  £[5„]  =  n^i  and  Var[5„]  =  n^r^,  we  make  the  identifications; 


Sn  -  n/i 
a 

h  —  nn 


a  +  hi. 


Thus,  the  probability  of  detection  is  approximately 


0  >  V{f  <  r) 
’■+7 


I-' 


r  ^  f  (*  -  AiO'  I 

/  j. — =^exp  < - r— n — 7  at. 


2ta'^ 


One  would  expect  this  approximation  to  be  better  when  the  i.i.d.  summands  forming  Sn 
are  Gaussian  random  variables  or  when  r  is  large. 


In  this  section,  we  have  developed  a  number  of  bounds  on  the  false  alarm  rate  and 
probability  of  detection  when  Page’s  test  is  used  for  transient  signal  detection.  Some  a 
these  bounds  are  derived  from  and  related  to  the  quickest  detection  performance  criteria 
for  the  disorder  problem.  In  particular,  the  simple  bounds  in  (4)  and  (16)  will  be  used  in 
conjunction  with  the  Monte  Carlo  simulations  in  Section  6. 


60 


3.4  Page’s  Test  Applied  to  the  Energy  Spectral  Density 


We  are  interested  in  detecting  transient  signals  using  the  sample  energj’  spectral  density 
(ESD)  as  the  observations.  In  this  regard,  we  are  motivated  by  the  work  of  Wolcin  [23]. 
A  detailed  description  of  his  method  is  given  in  the  Appendix,  and.  in  the  simulations  in 
Section  6,  that  method  will  be  compared  with  the  one  we  develop  here. 

W'hen  no  transients  are  present,  so  that  we  are  in  the  noise  only  situation,  we  assume 
that  the  background  noise  is  white  Gaussian  noise  (WGN).  We  assume  that  the  energj' 
spectral  density  is  white  merely  for  analytical  convenience.  For  a  colored  spectrum,  the 
approach  of  Wolcin  [23]  is  to  generate  an  unbiased  estimate  of  the  true  spectrum  and  to 
use  this  to  whiten  (ie.,  normalize)  the  spectrum.  The  same  approach  cam  be  used  here  so 
that  we  make  the  ideal  assumption  that  the  spectrum  is  white.  The  Gaussian  observations 
are  grouped  in  blocks  of  N  points  for  processing  via  the  Discrete  Fourier  Transform  (DFT). 
The  squared  magnitudes  of  the  N  complex  outputs  of  the  DFT  are  computed,  and  these 
statistics  are  available  for  the  detection  procedure.  These  random  variables  will  be  denoted 
by  {Xin},  1  =  1,2,...,  n  =  1,2, . . .  ,N  where  i  is  the  block  number  and  n  is  the  frequency 
bin  number. 

The  nonparametric  spectral  estimator  described  above  is  called  the  Periodogram  spec¬ 
tral  estimator  and  is  widely  used  because  of  its  simplicity.  In  [1,22],  a  more  complicated 
nonparametric  spectral  estimator  based  on  the  theory  of  evolutionary  spectra  due  to  Priestly 
[19,20]  is  used  in  conjunction  with  Page’s  test  to  detect  an  abrupt  change  in  the  spectrum  of 
a  piecewise  stationary  process.  However,  the  performance  of  these  tests  were  not  evaluated 
in  the  context  of  transient  detection.  In  [4],  an  autoregressive  moving  average  model  is 
assumed  for  the  data  but  no  evaluation  of  detection  performance  is  given. 
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Under  the  white  Gaussian  noise  assumption,  the  random  variables  will  be  in¬ 

dependent  and  identically  distributed  with  the  exponential  distribution  and  unity  mean. 
(Note:  this  ignores  the  fact  that  the  distribution  of  the  first  and  last  frequency  bins  will 
actually  have  the  \  j  distribution  with  a  variance  twice  that  of  tie  other  bins  [20],  or  it  can 
be  assumed  that  those  two  frequency  bins  are  not  used.)  Thus,  each  sample  ESD  will  be 

independent  of  the  others  and  will  have  a  probability  density  p„  given  by 

N  (  N 

=  n  exp{-i,„}  =  exp  ^  -  5]^  i,„ 

n=l  n=l 

for  i,„  >  0. 

The  transient  signal  we  are  attempting  to  detect  begins  at  the  unknown  block  to  and 
continues  through  the  block  tj.  As  in  [23],  we  make  the  following  ideal  assumptions  about 
the  sample  ESD  obtained  when  the  transient  is  present:  1)  the  random  variables  correspond¬ 
ing  to  different  frequency  bins  are  mutually  independent,  and  2)  the  sample  ESD's  corre¬ 
sponding  to  different  time  blocks  are  mutually  independent,  as  well.  The  distribution  of  an 
ESD  sample  with  a  transient  signal  present  (in  addition  to  the  WGN)  will  also  be  assumed 
to  be  exponential  but  now  with  a  mean  n  which  is  greater  than  one,  /^(j:)  =  exp(  -  x/p) 
for  X  >  0.  This  would  be  the  case,  for  example,  when  the  transient  signal  is  also  a  Gaussian 
random  process  which  is  independent  of  the  background  noise  process.  The  probability 
density  p,  of  the  spectral  samples  for  block  i  for  Iq  ^  £  h  can  now  be  written 

A  1  j  N  ^ 

Psix  i  \  1  Xj2i  ^  X,fi)  =  ~^inl  P%n  }  ~  GXp  ^  ^  X|„yp,„  -f  log  Pm  /  (  I  *  ) 

r.=  l  [  n=l  J 

foi  x,n  >  0.  Here,  p,„  >  1  with  strict  inequality  if  the  transient  signal  is  present  in  bin 
{i,n).  This  model  is  as  general  as  Wolcin’s  in  allowing  for  different  power  levels  {p,„}  in 
each  time/frcquency  bin. 


Because  the  parameters  {/!,„}  are  not  known  a  priori  and  because  the  distribution  of 
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the  t’th  ESD  need  not  be  identical  with  the  j’th  ESD  for  <0  <  Page’s  test  using 

the  optimal  nonlinearity,  the  log-likelihood  ratio,  can  not  be  implemented.  If,  on  the  other 
hand,  the  {/im}  were  known  and  did  not  depend  on  i  so  that  =  /in  for  to  <  i  <  ti,  then 
Page’s  test  would  be 


where 


q  >  L  Declare  Transient  Signal  Present 

'  <  ^  ^  Continue,  t  =  t  +  1 


Si  —  max{0,  •S|_]  +  5(-^il  >  -^12'  •  •  •  >  ^iN )} 


5o  =  0 

and  g  is  the  log-likelihood  ratio 


N 


5(A'.i,X.2,...,X.;v)=  5; 


n=l 


/in  -  1 


Mn 


^tn  -  log /in 


(18) 


This  nonlinearity  g  can  not  be  implemented  in  this  situation  because  the  parameters  {/in} 
are  not  known.  Many  nonlinearities  can  be  considered  instead  [see  Appendix  for  example]. 
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but  we  wiO  investigate  the  following  linear  “nonlinearity” 

N 

</(A',i ,  A',2,  . . . ,  A',,v )  =  ^  [A',n  -  1  -  <5] 

n=l 

where  (5  is  a  parameter  which  is  strictly  positive.  The  linearity  i  —  1  is  the  locally  opti¬ 
mal  nonlinearity  for  testing  between  the  hypotheses  “/i  =  1”  and  “/i  >  1”  for  univariate 
exponential  distributions  as  shown  below: 

/  ^  1-  _  1  =  Ml) 

ffjo(i)  =  hm  —  log-— — - — 

/iiiiaM  /^(.tIm  =  1) 

d  .  t7rexp{-i//x]} 

=  lim  —  log  — - ; - ^ - 

fiilldfi  exp{-x) 

=  lim  -  i/mi  +x} 

Mill  d/i 

=  lim  x/n]  -  l/fii 

Mill 

=  1-1. 

Moreover,  this  nonlinearity  is  uniformly  most  powerful  for  all  /i  >  1  in  the  same  univariate 
hypothesis  testing  problem.  In  the  multipari.meter/multivai iate  ca.se  using  the  ESD.  the 
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linearity  ^ven  will  not  be  the  lov.ally  optimal  detector  because  that  nonlinearity  requires 
taking  the  gradient  in  the  signal  direction  ai.J  will  result  in  uneven  weighting  of  the  fre¬ 
quency  bins.  This  can  not  be  done  here  because  the  are  not  known.  Also,  in  the 

multiparameter  case  in  the  present  context,  there  is  no  uniformly  most  powerful  test  to 
decide  between  mn  =  1,  n  =  1,2,. ..  ,N  and  /i,„  >1,  n  =  1,2, ...,  N.  Despite  these  draw¬ 
backs,  the  linear  function  in  (19)  which  implements  an  energy  detection  scheme  should  be  a 
good  test  for  our  situation  as  it  preserves  the  general  structure  of  the  optimal  nonlinearity. 
If  =  /i  for  to  <  i  <  tj  and  1  <  n  <  N,  then  we  could  TT':„ke  the  identification  from  (18) 
and  (19) 

1 -f  <5  =  — log/i.  (20) 

/i  -  1 

In  addition  to  the  fact  that  the  right  hand  side  of  (20)  is  strictly  greater  than  one  for  /i  ^  1, 
the  parameter  6  is  important  because,  as  shown  in  Chapter  2,  Page’s  test  performs  much 
better  when  the  mean  of  the  nonlinearity  before  the  disorder  takes  place  is  negative  as 
opposed  to  zero. 

The  false  alarm  rate  of  Page’s  test  implemented  with  the  nonlinearity  in  (19)  is  deter¬ 
mined  by  the  parameters  h  and  S.  Because  we  have  two  degrees  of  freedom,  there  are  many 
(h,S)  pairs  that  will  yield  the  same  false  alarm  rate  and  so  one  should  also  consider  how 
a  particular  pair  will  affect  the  detectability  of  transients.  The  determination  of  the  false 
alarm  rate  for  a  given  pair  (h,6)  does  not  depend  on  the  transient  but  only  on  the  noise 
distribution.  In  order  to  obtain  the  Lorden  bound  (4)  and  the  Wald  approximation  (6)  to 
the  mean  time  between  false  alarms  or  equivalently  the  false  alarm  rate,  it  is  necessary  to 
find  the  root  w  of  the  moment  generating  function  identity  (5) 

1  =  £'[exp{u;ff(A,,,A',2,.--,A',jv)}  lfJ,„  =  1,  n  =  1,2,...,  A] 


(  N  ] 

=  E 

exp  <  u;  ^  [A'.n  -  1  -  > 

Min  =  1,  n  =  1,2 . .V 

1  "=l  J 

. 
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(21). 


N 

=  n  -  1  -  ^)}  l/im  =  1] 

n=l 

=  (£1.XP  (a,(.Yi„  -  1  -  «))  |p„  =  1])*' 

Note  that  the  root  u  does  not  depend  on  N.  (This  fact  may  be  a  little  surprising  at  first 
glance.  One  would  expect  that  the  cumulative  sum  would  become  more  and  more  Gaussian 
as  N  increased  due  to  the  Central  Limit  Theorem  and  that  the  root  u}{N)  would  tend  to 
the  root  for  the  Gaussian  distribution,  which  is  ti^Gauss  =  2^,  as  iV  — ►  oo.  The  fallacy  in 
this  argument  arises  because  the  sum  of  the  N  random  variables  is  not  normalized  by  y/W 
which  would  permit  using  the  arguments  that  lead  to  the  Central  Limit  Theorem.)  After 
solving  (21)  for  u  (take  N  =  1),  equations  (4)-(7)  can  be  used  to  bound  and  approximate 
the  false  alarm  rate. 

Alternatively,  one  may  be  given  the  false  alarm  rate  a  and  a  minimum  value  of  Hi„ 
for  which  one  may  wish  to  test.  The  nonlinearity  (19)  implements  the  log-likelihood  ratio 
when  the  identification  (20)  is  made  with  n  —  ii\  and  when  the  threshold  h'  used  with  the 
log-likelihood  ratio  is  related  to  the  threshold  h  used  with  (19)  through 

h  =  -^h'.  (22) 

Ml  -  1 

Then,  since  the  root  of  the  moment  generating  function  identity  for  the  log-likelihood  ratio 
is  identically  equal  to  one,  it  follows  that  a  <  —  log/i'.  Now,  (22)  can  be  used  to  find  h  and 
(20)  can  be  used  to  find  6. 

It  is  intractable  to  analytically  evaluate  the  probability  of  detecting  a  transient  when 
the  transient  energy  can  be  arbitrarily  distributed  in  the  time/frequency  bins.  Thus,  we 
evaluate  the  performance  of  Page’s  test  for  the  following  specific  transient  structure.  For 
each  of  the  blocks  during  the  duration  r  of  the  transient  from  to  through  fi,  it  is  assumed 
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that  exactly  M  out  of  the  N  bins  will  have  transient  energy,  and  each  of  the  M  bins  will 
have  Hin  =  /i*  for  to  <  »  <  <i- 


The  signal-to-noise  ratio  S  =  u*  —  I  required  per  bin  to  achieve  a  given  performance 


measure  rj  is  found  from  (10)  and  (19)  to  be 


=  ±fJ- 

M  U(i) 


+  iV6  . 


It  is  clear  from  (23)  that  the  value  of  6  which  minimizes  the  SNR  does  not  depend  on  M. 


Using  the  lower  bound  to  the  probability  of  detection  in  ( 16)  we  have  that 


0  >  V{Sr  >  h) 

{r  N 

LE 

1=1  n=l 


(X.„  -l~6]>h 


Since  typical  values  of  N  are  64,  256,  and  1024,  the  product  tN  will  be  large  enough  to 
justify  using  the  Central  Limit  Theorem  to  compute  the  probability  in  (24).  The  mean  and 
variance  of  Sr  are  calculated  from  the  mean  and  variance  of  the  exponential  distribution 


yielding 


Thus,  the  bound  becomes 


£[5r]  =  rM(/i*-l)~r^i 
Var[5r]  =  tMh*^  t{N  -  M). 


h-TM{n*  -  1)  +  tN6 

Jt  +  at  -  a/) 


The  signal-to-noise  ratio  S  =  fi*  ~  \  required  for  at  least  50%  detectability  is  easily  found 


h  +  rN6 
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3.5  Alternatives  to  Page’s  Test 


The  performance  of  Page’s  test  with  the  linear  “nonlinearity”  will  be  compared  to 
several  other  procedures.  The  throe  other  tests  that  will  be  considered  are  the  “A'  block 
sliding  window  test,”  Wolcin’s  test,  and  Lorden’s  Test. 

The  block  sample  sliding  window  test”  has  been  called  the  “two  models”  approach 
in  connection  with  the  disorder  problem  [4]  (the  second  “model”  which  is  not  explicitly 
implemented  for  this  detector  is  used  to  estimate  the  background  spectrum  which  is  assumed 
known  here.)  For  the  present  application,  a  fixed  sample  size  test  statistic  T,^  based  on  the 
most  recent  K  blocks  of  AT-point  DFT’s,  blocks  i  through  i '  +  A'  —  1,  is  computed  according 
to 

i+A'-l  N 

E  (26) 

k=i  n=l 

and  is  compared  to  a  threshold  Hq.  The  test  is  implemented  sequentially  for  each  new  block 
computed  by  incrementing  i  in  (26)  which  amounts  to  using  the  new  DFT  and  throwing 
away  the  least  recent  DFT.  The  parameter  K  is  the  window  length.  While  the  probability 
of  false  alarm  is  easy  to  calculate  for  the  fixed  sample  size  test,  it  is  not  clear  how  to 
compute  the  false  alarm  rate  for  the  sequential  test.  An  approach  to  compute  the  false 
alarm  rate  might  be  to  find  the  expected  hitting  time  of  a  threshold  by  a  moving  average 
process.  However,  results  of  this  type  have  not  been  found  in  the  literature.  For  transients 
with  a  duration  r  with  r  <  A',  the  Central  Limit  Theorem  can  be  used  to  lower  bound  the 
probability  of  detection, 

^  >  1  _  t  -1’-) 

-\)  +  KnI 

and  the  SNR  5o  required  for  50%  detectability  is 


Upon  comparing  this  with  the  equivalent  expression  for  Page’s  test  (25),  we  note  there  is 
a  penalty  associated  with  Page’s  test  due  to  the  additional  term  involving  6.  However, 
the  threshold  for  Page’s  test  may  be  significantly  lower  than  the  threshold  for  the  sliding 
window  test  for  a  given  false  alarm  rate. 

Wolcin’s  test  is  described  in  detail  in  the  Appendix,  but  the  relevant  points  are  included 
here.  For  the  energy  detection  scheme,  Wolcin’s  test  computes  the  following  K  test  statistics 
for  fc  =  1,2,. ..  A'  from  the  most  recent  K  DFT’s,  numbered  i  —  K  +  I  through  i, 

=  E  (28) 

j=i-k+l  fi=l 

For  each  i,  the  K  statistics  in  (28)  are  compared  to  a  single  threshold  h\  and  if  any  exceeds 
the  threshold,  a  detection  is  declared.  Essentially,  the  statistics  in  (28)  are  used  to  detect 
a  transient  of  any  possible  length  from  one  to  K.  In  some  sense,  the  sliding  window  test 
is  a  poor-man’s  implementation  of  Wolcin’s  test  although  this  is  not  meant  to  imply  that 
Wolcin’s  test  is  necessarily  better  than  the  sliding  window  test.  For  i  fixed  in  (28),  it  is  not 
clear  how  to  compute  the  probability  of  false  alarm  with  this  multiple-hypothesis/multiple- 
statistic  formulation.  No  analytical  expression  could  be  found  for  the  false  alarm  rate  for 
the  sequential  implementation  of  the  test  either.  Using  the  Central  Limit  Theorem,  the 
probability  of  detection  of  a  transient  of  length  r  with  r  <  K  can  be  lower  bounded  by 

/?>!-$ 

The  SNR  5i  required  for  50%  detectability  is  given  by 

A  procedure  due  to  Lorden  [13]  which  extends  Page’s  test  so  as  to  test  a  simple  hy¬ 
pothesis  against  a  composite  alternative  warrants  investigation.  Specifically,  in  the  disorder 


hi  -  y/rMin*  -  1) 
v/A//i*2  +  {N  -  M) 
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problem,  let  Fq  be  the  distribution  of  the  i.i.d.  observations  before  the  disorder.  Let  B  par 
rameterize  an  exponential  (Koopman-Darmois)  family  distribution  Fg  which  includes  Fq. 
Once  the  disorder  takes  place,  the  i.i.d.  observations  are  drawn  from  the  distribution 

dFgiy)  =  exp  (By  -  b(B))dFo(y) 

for  some  B  >  Bi  and  where  6(0)  =  0.  Let  N  be  the  following  stopping  variable: 

N  =  inf{n  :  sup  (BS„  —  nb(8))  >  62  >  0},  (30) 

e>di 

where  5„  is  the  cumulative  sum  of  the  observations,  Sn  =  +  •  ■  •  -)-  Vn-  Let  N*  be  the 

extended  stopping  variable  given  by 

.V*  =  min{.VL  +  A:- 1}  (31) 

i>l  ^ 

where  is  N  applied  to  yit,yifc+i, _ The  false  alarm  rate  of  the  extended  stopping  rule 

is  easily  shown  to  be  upper  bounded  by 

a  <  exp{-h).  (32) 

Also,  this  procedure  is  optimal  in  the  quickest  detection  sense  asymptotically  as  /i  -*  00 
for  all  B  >  Bi-  This  is  analogous  to  a  uniformly  most  powerful  test  in  a  binary  hypothesis 
testing  problem. 

The  following  comments  are  in  order  before  using  this  test  to  detect  transients  from 
the  energy  spectral  density.  First,  the  optimality  of  the  test  is  only  in  the  asymptotic  case 
whereas  the  optimality  of  Page’s  test  in  the  disorder  problem  has  been  established  in  the 
non- asymptotic  case  as  well  [16].  Second,  the  result  applies  only  to  a  single  parameter  ex¬ 
ponential  family  distribution  —  this  is  analogous  to  the  fact  that  uniformly  most  powerful 
tests  exist  for  the  single  parameter  exponential  family  distribution  but  not  for  the  multipa¬ 
rameter  distribution.  Clearly,  when  a  transient  is  present,  the  ESD  has  a  multiparameter 
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exponential  family  distribution  (17).  Finally,  the  implementation  of  the  test  (30)  requires 
considerably  more  computation  and  memory  than  any  of  the  other  tests  (see  [13]  for  details 
on  the  implementation). 


Since  this  procedure  tests  for  a  composite  alternative,  one  would  hope  that  it  would 
be  robust  for  transients  with  unknown  amplitude.  Also,  as  for  Page’s  test,  the  false  alarm 
rate  depends  only  on  the  null  distribution  of  the  test  statistic.  For  the  ESD,  the  null 
distribution  of  the  test  statistic  has  a  single  parameter  exponential  family  distribution  since 
the  observations  1',,  i  =  1.2,...,  correspond  to  A',„,  i  =  1,2,...  which  have  the 

Erlang-7V  distribution. 


Pv,(^) 


^exp{-y} 

(AT-l)! 


for  y  >  0.  Thus  the  bound  (32)  on  the  false  alarm  rate  holds.  Unfortunately,  no  simple 
bounds  on  the  probability  of  detection  have  been  obtained. 


Finally,  procedures  which  implement  more  than  one  test  in  parallel,  such  as  several 
Page’s  tests  in  parallel  or  a  parallel  combination  of  Page’s  test  and  Wolcin’s  test,  could  be 
studied,  but  they  will  not  be  considered  here. 
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3.6  Results 


In  Figure  1,  the  root  of  the  moment  generating  function  identity  is  plotted  as  a  function 
of  6  as  given  by  (21).  As  is  evident  from  (21),  the  root  is  bounded  by  w  <  1.  Since 
Q  <  exp{-u>/i},  larger  values  of  lj  are  desired  but  this  goal  is  in  opposition  to  having  lower 
values  of  6  to  have  lower  required  SNR  through  (25). 

Figure  2  shows  the  SNR  required  per  bin  for  Page’s  test  to  achieve  performance  measures 
of  q  =  0-1,1,  and  10  as  a  function  of  the  number  of  bins  M  with  transient  signal  energy 
present  out  of  JV  =  64  and  N  =  1024  bins  as  given  by  (23).  Recall,  q  is  a  lower  bound 
for  the  ratio  (logT)/D  asymptotically  as  the  mean  time  between  false  alarms  T  tends  to 
infinity.  Consequently,  a  larger  q  corresponds  to  a  smaller  delay  D  in  detecting  the  disorder 
or  transient.  The  values  of  6  used  were  chosen  to  minimize  the  required  SNR  per  bin.  These 
graphs  are  easily  interpreted  for  the  disorder  problem  as  we  now  demonstrate.  Say  we  want 
to  lower  bound  the  mean  time  between  false  alarms  by  T  >  10*  blocks  of  64  samples  each. 
Then,  using  ( 10)  we  see  that  for  q  =  0.1,  the  expected  worst  case  delay  is  given  by  D  ss:  184.2 
blocks;  for  ^  =  1,  Z?  5s  18.4  blocks;  for  7  =  10,  Z?  a  1.8  blocks.  The  relations  (12)-(14)  can 
be  used  to  yield  bounds  on  the  probability  of  detection  of  transients.  For  examp'e,  for  large 
D,  (14)  can  be  used  to  show  that  transients  of  length  D  will  be  detected  with  approximately 
50%  probability  assuming  the  required  SNR  per  bin  as  indicated  in  the  figure  is  available. 

The  false  alarm  rates  of  the  various  tests  are  plotted  versus  the  threshold  in  Figure  3  for 
iV  =  64  and  N  =  1024.  Monte  Carlo  simulations  were  used  to  compute  the  false  alarm  rate 
for  Page’s  test  with  6  =  0.1  for  iV  =  64  and  with  6  =  0.05  for  N  =  1024.  The  corresponding 
Lorden  upper  bounds  (4)  and  Wald  approximations  (6)  are  also  shown.  The  values  of  6 
were  not  chosen  in  any  particular  optimal  fashion  but  the  general  guiding  principle  was  to 
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Root  of  the  Moment  Generating  Function  co 


Figure  1.  The  root  of  the  moment  generating  function  identity  ^ven  by  equation  (21)  as 
a  function  of  6. 
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Number  of  Bins  with  Transient  Signal  Energy 
(a) 


Number  of  Bins  with  Transient  Signal  Energy 
(b) 


Figure  2.  The  signaJ-to-noise  ratio  required  per  bin  for  Page’s  test  to  achieve  the  indicated 
preformance  as  a  function  of  the  number  of  bins  M  with  transient  signal  energy  present 
out  of  (a)  N  =  64  and  (b)  N  =  1024  bins. 
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Figure  3.  The  false  alarm  rates  of  the  tests  considered  as  a  function  of  the  threshold  for 
a  DFT  of  length  Ca)  jV  =  64  and  (b)  N  =  1024. 


lower  the  detection  threshold  by  choosing  a  larger  b  while  attempting  to  minimize  the  effect 
that  a  larger  6  has  on  the  required  SNR.  Simulations  were  also  run  to  determine  the  false 
alarm  rate  for  the  A'-block  sliding  window  test  and  Wolcin’s  test.  The  window  size  for  both 
of  these  tests  was  chosen  to  be  K  =  16  for  N  =  64  and  K  =  8  for  =  1024.  The  upper 
bound  for  the  false  alarm  rate  for  Lorden’s  test  given  by  a  <  exp(-/i)  is  not  shown  and  no 
simulations  were  performed. 

The  SNR  required  per  bin  for  50%  detectability  as  a  function  of  the  number  of  bins 
with  transient  signal  energy  present,  M,  out  of  N  =  64  and  N  =  1024  bins  is  shown  in 
Figures  4  and  5,  respectively.  The  false  alarm  rate  was  set  at  o  =  0.001  false  alarms  per 
DFT,  and,  with  the  exception  of  Lorden’s  test,  the  thresholds  were  determined  from  the 
Monte  Carlo  simulations  depicted  in  Figure  3.  For  Lorden’s  test,  the  threshold  was  set 
using  the  upper  bound  (32),  and  the  value  of  9i  in  (30)  was  chosen  to  correspond  to  a 
signal-to-noise  ratio  of  -20  dB.  The  results  are  shown  for  transients  of  length  1  <  r  <  32 
when  =  64  and  for  1  <  r  <  16  when  N  =  1024.  The  upper  bounds  for  Page’s  test  (25), 
the  sliding  window  test  (27),  and  Wolcin’s  test  (29)  are  shown  together  with  the  results 
of  Monte  Carlo  simulations  performed  for  all  four  tests.  The  simulations  show  that  Page’s 
test,  Wolcin’s  test,  and  Lorden’s  test  can  be  significantly  better  than  the  sliding  wdndow 
test,  by  approximately  5  dB,  for  detecting  short  transients  and  about  as  good  to  within  1 
dB  for  detecting  longer  transients.  Moreover,  these  results  do  not  depend  on  the  number 
of  bins  with  transient  signal  energj-,  M.  Page’s  test  is  the  easiest  of  any  of  the  tests  to 
implement.  It  also  requires  the  least  memory,  needing  to  store  only  a  single  scalar  value. 
To  within  a  half  dB,  it  performs  as  well  as  the  other  tests  in  nearly  all  cases  considered. 


(a) 


(b) 


Figure  4.  The  ‘signal -to- noise  ratio  required  per  bin  for  507t  detectability  with  false  alarm 
rate  o  =  0.001  false  alarms  per  DFT  as  a  function  of  the  number  of  bins  A/  with  signal 
energy  for  a  DFT  of  length  =  64  with  a  transient  of  length  cf  (a)  r  =  1.  (b)  r  =  4,  (c) 
r  =  8,  (d)  r  =  12,  (e)  r  =  16,  and  (f)  r  =  32  blocks.  For  Page  s  test,  6  =  0.1  was  used. 
The  window  size  for  the  sliding  window  test  and  VVoIcin's  test  is  A'  =  16.  Lorden's  test  was 
designed  for  a  mininuim  SNR  of  -20  dB. 
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(a) 


Number  of  Bins  with  Transient  Signal  Energy 
(b) 


Figure  5.  The  signal-to-noise  ratio  required  per  bin  for  50%  detectability  with  false  jJarm 
rate  o  =  0.001  false  alarms  per  DFT  as  a  function  of  the  number  of  bins  M  with  signal 
energy  for  a  DFT  of  length  N  =  1024  with  a  transient  of  length  of  (a)  r  =  1,  (b)  r  =  4, 
(c)  r  =  8,  and  (d)  r  =  16  blocks.  For  Page’s  test,  6  =  0.05  was  used.  The  window  size  for 
the  sliding  window  test  and  Wolcin’s  test  is  A'  =  8  blocks.  Lordcn's  test  was  designed  for 
a  minimum  SNR  of  -20  dB. 
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3.7  Conclusion 


The  detection  of  transient  signals  is  an  especially  difficult  problem  because,  in  general, 
transients  occur  at  unknown  times  and  have  unknown  durations,  waveforms,  and  ampli¬ 
tudes.  By  using  maximum  likelihood  considerations  we  have  addressed  the  problems  of 
unknown  arrival  time  and  unknown  duration  of  transient  signals,  and  the  resulting  de¬ 
tector,  Page’s  test,  performs  favorably  in  this  regard.  Various  detectors  which  have  been 
proposed  in  the  literature  were  analyzed  in  the  context  of  transient  detection  and  simular 
tions  were  used  to  determine  their  performance.  Page’s  test,  which  is  the  easiest  of  the  tests 
to  implement,  performed  very  well  in  comparison  to  the  other  tests. 
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3.8  Appendix  -  Summary  of  the  Approach  of  Wolcin 


3.8.1  Introduction 

In  [23],  Wolcin  derived  a  test  for  detecting  transient  signals  using  the  sample  energy 
spectral  density  as  the  observation.  Fixed  sample  size  (FSS)  tests  were  considered  where 
the  test  operates  on  the  samples  in  a  A'  x  IV  time/frequency  grid.  The  design  objective  of 
the  detector  was  that  it  should  be  robust  with  respect  to  the  distribution  of  the  transient 
signal  energy  in  the  two  dimensional  time-frequency  space.  Moreover,  it  should  not  assume 
a  particular  model  for  the  transient  signal,  such  as  an  autoregressive  moving  average  model 
(ARMA).  Wolcin  derived  a  generalized  likelihood  ratio  detector  and  showed  that  in  certain 
situations  this  approach  outperformed  simple  energy  detection  by  requiring  a  lower  SNR  for 
50%  detectability.  By  combining  the  GLR  detector  with  energy  detection,  Wolcin  proposed 
to  achieve  good  performance  in  all  or  most  situations.  Because  the  time  when  the  transient 
occurs  is  not  known,  Wolcin  uses  a  sequential  algorithm  which  successively  performs  multiple 
FSS  tests.  We  now  review  Wolcin’s  procedure  in  some  detail. 

3.8.2  The  Robustness  Objective 

As  stated  above,  a  detector  is  desired  which  will  perform  well  for  a  wide  variety  of  Iran- 
sient  signals.  No  structure  is  assumed  for  the  process  causing  the  transient.  The  transient 
can  be  narrowband  in  nature  such  as  a  sinusoid  with  finite  duration  or  an  exponentially 
damped  sinusoid,  or  the  transient  can  be  broadband.  Since  it  is  assumed  that  the  tran¬ 
sient  signal  is  independent  of  the  background  noise,  the  presence  of  a  transient  results  in 
an  increase  in  the  total  energy  (power)  which  is  to  be  detected  by  processing  the  observa¬ 
tions  of  the  sample  energ>'  spectral  density.  The  actual  increase  in  ci,.‘’'gy  in  a  particular 
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time/frequency  bin  due  to  the  presence  of  a  transient  signal  is  assumed  unknown,  further 
complicating  the  detection  procedure. 

3.8.3  Unknown  Background  Energy  Spectral  Density 


In  addition  to  not  knowing  the  energy  spectral  density  of  the  transient,  the  ESD  of 
the  background  noise  is  assumed  unknown,  as  well.  The  background  ESD  is  assumed  to 
be  stationary  or  slowly  varying  with  time  so  that  the  following  spectral  estimation  can  be 
implemented.  The  time  samples  are  grouped  in  blocks  of  N  for  processing  via  the  Discrete 
Fourier  Transform  (DFT).  Let  A'fc„  denote  the  squared  magnitude  of  the  output  of  the  DFT 
for  the  A:’  h  time  block  and  n’th  frequency  bin.  Wolcin  uses  the  recursive  estimate  for  the 
unknown  ESD 

=  (1  -  (33) 

where  is  the  estimate  of  ESD  for  the  n’th  frequency  bin  at  time  k  and  where  o  is  a 
parameter  between  zero  and  one.  This  estimate  gives  past  observations  an  exponentially 
decreasing  weight  and  this  allows  the  estimate  to  change  slowly  with  time.  The  parameter 
a  is  chosen  to  compromise  the  objectives  of  speed  in  adapting  to  changes  in  the  background 
spectrum  versus  the  steady-state  variance  of  the  estimate.  After  initializing  the  estimate 
and  allowing  it  to  stabilize,  the  spectral  samples  can  be  normalized  or  whitened  by  the 
spectral  estimate  by 

^kn  ~  ^knl^kn- 


If  a  transient  signal  was  detected  in  a  particular  time/frequency  bin,  then  that  sample 
should  not  be  used  to  estimate  the  background  noise  level  so  that  (33)  is  modified  to 


Gk 


n 


5jt_i  „,  if  transient  detected  in  bin  {k,n) 

(1  -  Q)Gjt_i,„  +  oZjtn,  otherwise. 
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3.8.4  The  Fixed  Sample  Size  Transient  Detector 


For  a  fixed  number  of  blocks  K  and  frequency  bins  N ,  the  detectors  derived  are  fixed 
sample  size  detectors  based  on  observations  of  the  normalized  energy  samples  for 
k  =  1,2,..., A',  n  =  1,2, ...,iV.  In  order  to  simplify  the  derivation  of  a  detector,  the 
following  ideal  assumptions  are  made  about  the  distribution  of  the  (these  are  identical 

to  those  in  Section  4).  Under  the  null  hypothesis  Hq  when  only  the  background  noise  is 
present,  the  are  assumed  to  be  independent  and  identically  distributed  exponentially 

distributed  random  variables  with  a  mean  of  one.  (Note:  the  first  and  last  coefficients 
actually  have  distributions  but  we  assume  these  coefficients  are  not  used  for  simplicity.) 
This  would  be  the  case  if  the  input  to  the  DFT  was  a  white  Gaussian  process  and  if  we 
ignored  the  effect  that  the  normalization  process  has  on  the  distribution  of  the  energy 
samples.  Under  the  alternative  hypothesis  H\  where  noise  and  transient  signal  are  present, 
the  energy  samples  are  still  assumed  to  be  independent  exponentially  distributed  random 
variables,  but  now  sample  has  mean  where  >  1.  Thus  we  have  the  binary 
hypothesis  testing  problem 

(  K  N  \ 

Hq  :  p(X)  =  po(X)  =  exp  -  ^  A'jt,,  J  ,  0  <  Au  <  oo 

\  fc=l«=l  / 

A'  N 

Hi:piX)  =  pi{X)=  n  n  ~«xp(-.Yfc„/^t„),  0<  <00. 

t=l  n«l 

Assuming  the  parameters  {/ifcn}  known  the  optimal  detector  is  given  by  the  likelihood 
ratio  test 

Fi(X|£/)  >  Declare  Hi 

Po(X)  <”  =>  Declare //o- 

Since  the  means  {ptn}  are  unknown,  they  are  replaced  with  their  maximum  likelihood 


estimates  given  by 


if  A'tn  >  1  +  5o 
if  ''  in  <1+^0 
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where  5  is  a  parameter  which  we  may  choose  to  optimize.  The  resulting  detector  is  a 
generalized  likelihood  ratio  (GLR)  test  given  by 


K  N 


=»  Declare  H\ 
=>  Declare  Hq, 


where 


Y,  —  f  ^kn  ~  1  —  log  ^kn » 

~  1 0, 


if  >  1  +  5 

if  Xic„  <  1  +  S. 


This  detector  can  be  compared  with  energy  detection  which  is  given  by 

V  1  ^  ^  Declare  Hi 

2^^kn  ^  Declare  7/o- 

k=l n=l 

This  detector  implements  the  locally  optimal  detector  for  testing  >  l,fc  = 

1,2,..., A',  n  =  1,2, ...,iV”  versus  “/ifcn  =  =  1,2,..., A',  n  =  1,2,..., A”’  and  is 

uniformly  most  powerful  for  all  /i*  >  1. 

3.8.5  Performance 


In  order  to  determine  the  performance  of  the  proposed  detectors,  it  is  convenient  to 
relabel  the  K  x  N  observations  of  Xjc„  to  be  {Xn}^_l  where  N'  =  K  x  N ,  and  similarly 
for  Vfcn-  Thus  the  GLR  detector  is 

N' 

A  -  V  ^  h  Declare  Hi 

=:►  Declare //o, 

n=l 


and  the  energy  detector  is 


N' 


i?  =  5^  A'„  -  1  >  /i 


n=l 


=>  Declare  H\ 
Declare  //q  . 


Using  a  Central  Limit  Theorem  argument  for  large  N',  the  distributions  of  the  test  statistics 
A  and  B  under  the  two  hypotheses  are  approximately  Gaussian  with  the  appropriate  means 
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and  variances.  For  the  purpose  of  evaluating  the  performance,  it  is  assumed  that  the 
transient  has  uniform  strength  S  in  M  out  of  N'  bins.  (This  assumption  is  in  contrast  with 
the  one  made  in  Section  5.  There,  we  assumed  that  the  transient  had  uniform  strength 
in  M  out  of  N  frequency  bins  for  ejich  consecutive  time  block  during  the  duration  of  the 
transient.  They  need  not  be  the  same  frequency  bins  in  each  time  block.  Here,  any  M 
out  of  the  whole  rectangular  grid  of  K  x  N  =  N'  bins  can  have  transient  energy.  For 
example,  There  could  be  transient  energy  in  the  first  and  second  DFT  blocks,  none  in  the 
third  through  sixth,  and  some  in  the  seventh  and  eighth.  According  to  the  assumption  of 
Section  5,  this  example  would  be  considered  as  two  seperate  transients.  However,  it  would 
be  considered  only  one  transient  here.  It  should  be  noted  that  these  differences  are  not 
critical.  The  assumptions  were  made  only  to  make  the  performance  analysis  tractable.) 
Now,  using  the  Gaussian  assumption,  the  performance  can  be  determined  explicitly. 


For  the  GLR  detector,  the  mean  and  variance  of  the  summand  when  no  transient  signal 
is  present  in  bin  n  are 

=  ^o[in]  =  /  (x  -  1  -  logx)exp(-i)dx 
Jl+So 

roo 


<^0 


=  (T^(So,0)  =  Varo[ln]  =  /  (x  -  1  -  logx)'exp(-x)dx  - 

xl+So 

and  when  the  transient  energy  is  present  they  are 

"•  =  (’ifs) ''' 


cr} 


?  =  <t^(So,5)  =  Vari(y„]  =  (x  -  1  -  log exp 


(34) 


(Analytical  expressions  for  /i(5o,5)  and  cr^(So,5)  are  given  in  Section  3.8.6.)  Thus,  the 


detection  problem  becomes 

Ho:A^N(N'^o,N'al) 

^1  :  A  ~  A  (M/ii  +  {N'  -  Af)tio,Mai  +  (.V'  -  A/)cr5)  . 

Let  $(x)  be  the  standard  Gaussian  cumulative  distribution  function.  It  is  straightforward  to 
show  that  the  receiver  operating  characteristic  (ROC)  indicating  the  probability  of  detection 
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f3  versus  probability  of  false  alarm  oi  is 


^  =  l-$ 


y/W<TQ^  ^(1  -  g)  +  M(/xo  -  Ml) 


For  energy  detection,  it  easy  to  show  that  the  detection  problem  becomes 

f7o  :  5  -  N(0,iV') 


Hi  ;fl~N(MS,iV'  + A/S) 


so  that  the  ROC  is  given  by 


/3  =  l-$ 


-a)-  MS 
v/A'  +  A/S 


To  illustrate  the  performance,  Wolcin  determined  the  signal-to-noise  ratio  (SNR)  S 
required  per  bin  to  acheive  a  50%  detection  probability  as  a  function  of  the  number  of  bins 
M  with  transient  signal  energy.  In  this  case,  by  solving  (35)  and  (36)  with  3  =  0.5,  we  find 
that  for  the  GLR  detector 


= 


y/Wao^  ^(1  -  q) 


and  for  the  energy  detector 


-  a) 


Wolcin  found  that  the  GLR  detector  outperforms  the  energy  detector  for  lower  values  of  A/ 
whereas  the  reverse  is  true  for  larger  values  of  A/. 


3.8.6  Sequential  Implementation 


In  the  computer  algorithm  MAXTRAN  [23]  for  transient  detection,  Wolcin  modified 
the  fixed  sample  size  test  described  above  for  sequential  implementation  as  follows.  Each 
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time  a  new  DFT  is  computed,  say  number  t,  the  most  recent  K  DFT’s  numbered  i  —  A'  +  1 
through  i  are  used  to  generate  K  detection  statistics  for  ^  =  1, ...  A'  according  to 

1  •  ^ 
n=l 

for  the  GLR  detector  and 

=  t,  (38) 

*  j=t~k+l  n=l 

for  the  energy  detector.  For  each  i,  these  K  statistics  are  compared  to  a  single  threshold 
h,  and  if  any  one  of  the  statistics  exceeds  the  threshold,  a  detection  is  declared.  For  each 
fc,  the  statistic  is  in  effect  testing  the  hypothesis  that  a  transient  of  length  k  exists. 
The  normalization  in  (37)  and  (38)  by  \/fc  ensures  that  when  no  signal  is  present  the  K 
statistics  all  have  the  same  distribution.  For  the  GLR  detector,  the  distribution  is 

VF,*- Ar(0,iV4), 

and  for  the  energy  detector,  the  distribution  is 

fF*  ~  .V(0,A^) 


independent  of  k.  Now,  the  probability  of  false  alarm  of  each  individual  test  can  be  set 
at  the  same  value  by  using  a  single  threshold.  Unfortunately,  there  does  not  seem  to  be  a 
straightforward  relationship  between  this  probability  of  false  alarm  and  the  false  alarm  rate 
of  the  overall  sequential  test,  and  the  performance  of  the  sequential  test  was  not  evaluated 
in  [23].  The  probability  of  detection  can  be  lower  bounued  by  the  probability  that  the 
statistic  corresponding  to  the  true  length  r  of  the  transient  exceeds  the  threshold.  For  the 
GLR  detector,  that  is 


h  +  (.U/v/r  )(/io  -  /i] ) 
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and  for  the  energy  detection  it  is 


0> 


1  -  ^ 


h~{MI^)S 
.[iV  +  (A//r)5]^/^  ■ 


The  SNR  required  for  50%  detectability  is  easily  found  to  be 


Ml  -  MO  = 


•v/f/i 

~ir 


for  the  GLR  detector  and 


for  the  energy  detector. 


3.8.7  Analytic2il  Formulae 


Using  the  table  of  integrals  in  [21]  and  some  tedious  calculus  yields  the 


following  ex¬ 


pressions  for  /i(5(},5)  and  <t^(Sq,S)  defined  in  (33)  which  were  not  evaluated  in  [23]: 


/x(5o,  5)  =  exp  [1  +  5o  -h  5  -  log(l  -H  5o)]  +  Ei  (-\^) 

<t2(5o,  5)  =  exp  I I  [25o  +  5^  25o5  +  45  +  2S‘ 

+  log2(l  5o)  -  2log(l  -h  5o)  -  2Solog(l  +  5o)  -  251og(l  +  5o) 


-f  2Ei  ^ ~  -I-  log  ^  +  C(2) 


1  5i 


fc-Hl 


1  +  5/ 


-M^(5o,5^ 


where 


X 

Ei  is  the  exponentiaJ  integral,  Ei(x)  =  J 


C  is  Euler’s  constant,  C  =  0.5772156G49 . . ., 


C  is  the  Riemann  Zeta-function  for  which  (,(2)  =  rr^/G. 
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4 


The  Gabor  Representation 


4.1  Introduction 


Classical  time  domain  analysis  and  frequency  domain  analysis  are  the  cornerstones  of 
most  of  the  signal  processing  work  in  electrical  engineering.  These  approaches,  however, 
can  be  inadequate  in  a  variety  of  signal  processing  situations  as  the  following  example 
demonstrates.  Consider  a  single  musical  note  played  on  some  instrument,  say  a  flute,  for  a 
few  seconds.  The  human  ear  hears  only  a  single  frequency  being  played  and  yet  classical 
Fourier  theory  implies  that  this  is  not  the  case.  It  is  well  known  that  any  time-limited 
signal  has  a  Fourier  transform  or  frequency  spectrum  with  infinite  support.  It  is  apparent 
that  the  classical  theory,  despite  its  unimpeachable  validity,  is  inadequate  for  describing 
human  observation  and  perception.  This  inadequacy  was  addressed  by  developing  hybrid 
representations  for  signals  which  combine  the  temporal  and  spectral  aspects  of  a  signal, 
in  contrast  with  classical  analysis  which  deals  with  only  one  of  the  aspects  at  a  time. 
Approaches  that  will  not  be  discussed  here  are  the  “instantaneous  power  spectra”  [13]  and 
the  “evolutionary  spectra”  [17]  both  of  which  define  a  frequency  spectrum  that  is  localized 
at  a  particular  time  and  changes  with  time.  Other  hybrid  representations  are  the  Wigner 
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distribution  and  the  (cross)  ambiguity  function  [12].  Another  approach  was  explored  in  the 
seminal  paper  by  Gabor  in  1946  [5]. 

Gabor  proposed  representing  a  signal  s{t)  according  to  the  expansion 

OO  00 

^(0=  H  Cmn9it-otn)exp\j2irl3m{t- an)]  (1) 

m=— OO  n=— OO 

where  {Cmn]  aje  complex  coefficients,  g{t)  is  the  window  function  with  unit  energy,  and  a 
and  /?  are  real  parameters.  Gabor  originally  considered  a  Gaussian-shaped  window  function 
and  this  will  be  considered  in  more  detail  in  Section  3.  The  key  feature  of  the  representation 
(1)  is  that  the  coefficients  correspond  to  a  lattice  of  points  in  the  two-dimensional  space 
corresponding  to  time  and  frequency.  For  example,  by  using  a  rectangular  pulse  as  the 
window  function,  the  musical  note  which  was  inadequately  described  by  classical  represen¬ 
tations  can  be  represented  exactly  in  the  Gabor  representation  by  one  nonzero  coefficient 
for  a  particular  arrival  time  and  frequency,  and  this  representation  coincides  with  human 
perception.  Because  the  Gabor  representation  results  in  a  parsimonious  representation  of 
a  continuous  signal  that  combines  temporal  and  spectral  aspects,  it  has  recently  been  the 
subject  of  further  study  in  various  applications.  The  representation  was  first  applied  by 
Gabor  to  the  analysis  of  hearing  [5],  and  has  more  recently  been  used  in  connection  with 
image  representation  and  vision  analysis  [15,16],  and  with  transient  signal  analysis  [4]. 

Throughout  this  chapter,  the  signal  s{t)  is  assumed  to  be  sufficiently  well  behaved  so  as 
to  admit  a  Gabor  representation.  The  existence  and  uniqueness  of  the  Gabor  coefficients  has 
been  investigated  by  various  authors  and  references  can  be  found  in  [7].  The  representation 
is  least  problematic  when  o/J  =  1  and,  as  has  been  done  frequently  throughout  the  literature, 
this  condition  is  adopted  here,  too.  Since  the  time  axis  in  (1)  can  be  scaled  according  to 
t'  =  t/a  and  the  window  function  can  be  scaled  according  to  g'{t)  =  \/l//3  giill3),  it  suffices 


93 


to  consider  the  case  o  =  ^9  =  1.  Now,  the  representation  is  simplified  to 

OO  00 

S  Cmng{t  -  n)exp{j2mxt).  (2) 

m=— OO  n=— 00 

Signals  with  forms  similar  to  (1)  or  (2)  have  been  considered  in  other  contexts  outside 
the  signal  representation  framework  [19].  The  signal  in  (2)  could  model  a  multiuser  envi¬ 
ronment  with  a  single  channel  where  each  user  is  assigned  a  frequency  index  m  and  each 
user  transmits  a  bit  sequence  modulated  by  the  window  function.  Alternatively,  the  window 
function  can  represent  the  channel  spreading  which  causes  intersymbol  interference  (ISI).  In 
the  multiuser  context,  the  symbol  alphabet  of  the  coefficients  {Cmn}  is  finite  as  in  digital 
communication  where  Cmn  can  take  only  the  values  -)-l  and  -1.  However,  in  the  context 
of  the  Gabor  representation,  the  coefficients  can  be  arbitrary  complex  numbers.  At  first, 
this  might  seem  a  complication  of  the  multiuser  situation,  but  actually  it  is  a  simplification. 
Now,  any  optimization  done  over  the  coefficients  is  an  unconstrained  optimization  and  this 
will  result  in  analytically  tractable  solutions,  in  contrast  with  solutions  which  implement 
the  Viterbi  algorithm  in  the  multiuser /ISI  setting. 

In  Section  2,  the  determination  of  the  Gabor  coefficients  is  discussed  and  two  methods 
are  compared  when  the  desired  signal  is  corrupted  by  noise.  In  Section  3,  the  representation 
is  considered  when  a  Gaussian-shaped  window  function  is  used,  which  is  the  most  comir  ^niy 
used  window  function  in  the  literature.  In  Section  4,  an  exponential  window  function  is 
considered  and  the  representation  is  applied  to  transient  signal  detection  for  finite  observa¬ 
tion  intervals.  The  results  of  this  section  are  extended  to  infinite  observation  intervals  in 
Section  5,  resulting  in  a  sequential  detection  algorithm  for  transient  signal  detection. 


4.2  The  Gabor  Coefficients 

4.2.1  Introduction 

The  transform  pairs,  the  forward  transform  and  the  inverse  transform,  related  to  the 
various  Fourier  representations  —  the  Fourier  transform,  the  Fourier  series,  and  the  discrete 
Fourier  transform  —  are  used  frequently  in  the  signal  processing  community.  The  special 
property  that  makes  these  transform  pairs  tractable  is  the  orthogonality  of  sinusoids.  As 
is  obvious  from  the  Gabor  representation  (2),  the  basis  functions  are  not  orthogonal  for 
general  window  functions.  Consequently,  the  forward  tramsform  to  compute  the  Gabor 
coefficients  is  considerably  more  complicated  than  in  the  more  familiar  transforms. 

4.2.2  The  Wigner  Transform 

Before  considering  the  methods  for  the  forward  transform,  it  is  instructive  to  look  at 
the  generalized  Wigner  transform,  also  called  the  complex  spectrogram.  Consider  a  window 
function  5(0  of  unit  energy  and  define  the  generalized  Wigner  transform  5(t.w)  of  a  signal 
s(0  according  to  [1,6,12] 

OO 

S{'r,f)=  J  3{t)g*{t-T)exi>{-j2Trft)dt,  (3) 

— OO 

where  g*{t)  is  the  complex  conjugate  of  g{t).  Then,  the  inverse  transform  is  given  by  [see 
12  for  details] 

OO  OO 

^(0=  J  J  S{T,f)g{t-T)exp{j2Trft)dTdf.  (4) 

— OO  — OO 

Discretizing  the  double  integral  in  (4)  by  using  a  rectangular  approximation  and  sampling 
th^  integrand  at  r  =  0,  ±1,±2, . . .  and  at  /  =  0,  ±1,  ±2, . . .  results  in  the  following  approx¬ 
imation 

OO  OO 

H  5(m, Ti)5(t  -  n)exp(j2m7rO.  (5) 

m=— OO  n=— OO 

Except  for  the  approximation,  (5)  is  exactly  the  Gabor  representation  in  (2).  This  suggests 
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similarly  discretizing  the  integral  in  (3)  to  approximate  the  coefficients  as  follows 

oo 

5(m, n)  as  ^  s{k)g*{k  —  n)exp{-j2‘KTnk) 

(6) 

=  E  msHk-o). 

Jbsz— oo 

Unfortunately,  the  approximation  is  poor  and  is  not  even  a  function  of  the  frequency  index 
m.  Nevertheless,  it  is  apparent  from  (5)  that  the  Gabor  representation  and  the  Wigner 
transform  are  related.  These  relationships  are  explored  in  [1,2,7]. 

4.2.3  The  Gabor  Coefficients  via  the  Biorthogonal  Function 

There  are  two  approaches  to  computing  the  Gabor  coefficients.  One  has  been  used 
widely  in  the  literature  while  the  other  has  been  curiously  overlooked  in  the  work  on  the 
Gabor  representation,  despite  the  fact  that  the  tools  and  techniques  have  been  available 
for  quite  some  time.  The  popular  approach  is  presented  first.  Details  can  be  found  in 
[1, 2,3,4, 7]. 


For  a  signal  x(0,  define  its  Zak  transform  (2x)(x,/)  according  to 

oo 

(2i)(z,/)=  51  i(z-k)exp(-;2kx/). 

k=—(X) 

Then,  the  Gabor  coefficients  for  s{t)  can  be  computed  from 

Cmn  =  J  J  ('2g)(i’'/)  exp[-j2ff(m;:  +  nf)]dzdf. 

0  0 

A  more  convenient  formula  is  obtained  by  considering  time  domain  processing  instead  of 
Zak  transform  domain  processing  as  follows.  Define  the  function  7(/)  according  to 

{Zgy(tjy 

assuming  this  integral  exists.  The  function  7(t)  is  called  the  biorthogonal  function  of  g{t) 


because  it  exhibits  the  property 


rOO 

I  g(t  ~  n)j*{t  -  /)  exp[j2x(7n  -  k)t]dt  =  6(n  -  l)6(m  -  k). 

•/  — OO 
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(8) 


Using  this  property,  the  Gabor  coefficients  can  easily  be  seen  to  be  given  by 

00 

Cmn  —  /  —  n)exp{—j2Tnnt)dt. 

—00 

This  method  of  computing  the  coefficients  will  be  called  the  biortbogonal  function  method 
(BFM). 

4.2.4  The  Gabor  Coefficients  via  the  Projection  Theorem 

The  second  approach  to  computing  the  Gabor  coefficientf  can  be  developed  in  two  ways, 
one  using  a  Hilbert  space  approach  and  one  using  a  random  variable  approach.  We  consider 
the  Hilbert  space  development  first. 

It  will  be  assumed  that  the  functions  {/mn*  m,n  6  2}  are  linearly  independent  over 
the  observation  interval  7  C  IR  where 

/mn(0  =  -  n)exp(;2mx<).  (9) 

If  the  biorthogonal  function  exists,  then  the  following  lemma  establishes  the  linear  indepen¬ 
dence  of  the  functions  when  defined  over  IR. 

Lemma.  If  the  biorthogonal  function  ‘y(t)  corresponding  to  g{t)  exists,  then  the  set  of 
functions  defined  by  (9)  are  linearly  independent  over  IR. 

Proof.  Assume  that  the  functions  are  linearly  dependent.  This  implies  that  for  some  k 
and  I  and  coefficients  {Cmn} 

fklW  —  Cmn/mn(0 

m^k  n^l 

or  equivalently 

g(t  -  I)exp{j2kiri)  =5^53  ^mn9{t  -  Ti)exp(j2m7rt)  (10) 

mjik  njil 
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for  all  t  G  IR.  Multiplying  both  sides  of  (10)  by  -  /)exp(-j2;-7rO  and  integrating  over 
IR  yields 

oo  oo 

J  ^Cmn5(i-n)  J  g{t~n)^*(t-l)e\j>[j2{m-k)i:t]dt.  (11) 

-00  m^kn^l 

By  the  biorthogonality  property  (7),  the  left  hand  side  of  (11)  is  equal  to  1  while  the  right 
hand  side  is  equal  to  0.  Thus,  the  functions  must  be  linearly  independent.  | 

Let  g{t)  be  a  function  in  L2{I),  the  Hilbert  space  given  by 

^2(/)  ={/:/-<!:;  !l/!lij<oc} 
where  the  norm  |l  •  Ij^j  is  induced  by  the  inner  product 

if^g)  =  (12) 

so  that  W/Wli  =  {/,  Consider  the  space  of  functions  G(I)  defined  on  /  given  by 

00  OO 

G{.I)  =  {f  £  L2{I) f  =:  Y2,  ^  C'mnS'(t  -  n)exp(j2mxt)  for  some  G€}. 

m=— OO  n=— 00 

It  is  apparent  from  its  definition  that  G{I)  is  a  linear  vector  space.  Since  G{I)  c  ^2(7), 
the  inner  product  on  L2{I)  in  (12)  is  also  an  inner  product  on  G(7).  Thus  G(7)  is  a  pre- 
Hilbert  space.  In  order  to  show  that  G(/)  is  a  Hilbert  space,  G(/)  must  be  shown  to  be 
complete,  that  is,  any  Cauchy  sequence  in  G(/)  must  converge  to  an  element  in  G(/).  This 
is  demonstrated  by  the  following  theorem. 

Theorem.  G(/)  is  Hilbert  space. 

Proof.  From  the  discussion  above  it  suffices  to  show  that  G(7)  is  complete.  First,  the 
lattice  of  points  L  given  by 

L  =  {(m,n)  :  m,n  G  Z} 
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is  linearized  and  ordered  as  depicted  in  Figure  1.  Now,  instead  of  the  functions  {/mn;  Tn,n  E 
Z}  in  (9),  we  have  {/„;  n  E  IN}. 
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•.•■4-— 4" 
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00  _ 

9 

10  :  : 
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22  j23 

24 
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••4-— 4 . 

. I . 
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Figure  1.  An  ordering  of  the  points  in  L. 

By  the  Gram-Schmidt  orthogonalization  procedure  [10],  a  countable  set  of  orthonormal 
functions  {e„;  n  G  IN}  is  obtained  satisfying  for  each  n  G  IN 

span{7i,72,...,7n}  =  span{ei,e2,...,e„}. 

Let  /  be  a  function  in  G{I),  Using  the  new  ordering  of  the  lattice  and  the  new  orthonormal 
spanning  vectors,  /  can  be  written  as 

/(O  =  ^  ^n€n{t)  (13) 

n=l 

for  t  E  I.  Since  ||/||^  <  oo,  it  follows  that  J2n\^n\~  <  oo.  In  other  words,  {Ai,/l2,...}  is 
in  I2,  the  Hilbert  space  of  square  summable  sequences  with  the  ordinary  Euclidean  inner 
product  and  norm  denoted  by  ||  •  ||/j.  Let  {f^]  be  a  sequence  in  G{I)  so  that  in  the  form 
(13), 

/‘‘it)  =  E  A^„en{t), 


99 


and  let  be  the  element  in  I2  given  by  =  {ylj, •  •  •}•  Since 

ii/‘  -  nii,  =  i:  -  Kf  =  ii^‘  - 

n=l 

{/*}  is  a  Cauchy  sequence  in  G{I)  if  and  only  if  is  a  Cauchy  sequence  in  l2-  Since 
I2  is  complete,  converges  to  x°°  =  in  l2-  It  now  suffices  to  show  that 

converges  to  the  function  given  by 

00 

nt)  =  53  A-en(0 

n=l 

which  is  obviously  in  G(/).  Now, 

00 

ll/‘  -  /“Hi,  =  E  i4  -  -  x“ii?,  —  0 

n=l 

as  A:  — ►  0,  which  completes  the  proof.  ■ 

Now  that  it  has  been  established  that  G[I)  is  a  Hilbert  space,  the  Projection  Theorem 
and  its  corollaries  can  be  exploited.  Consider  a  finite  collection  of  functions  {/] ,  /2.  •  ■  ■ .  jn) 
in  G{I)  where 

/i(0  =5(<-/,)expO'2A'i7rt) 

for  some  and  for  i  =  1,2, ...  ,n.  An  arbitrary  function  h  e  G[I)  can  be  projected  onto 
the  closed  subspace  5  generated  by  the  collection  of  functions,  and  the  result  is  the  unique 
vector  h  in  S  which  is  closest  to  /.  The  minimizing  vector  h  can  be  expressed  as 

t-i 

The  {a,},  which  are  the  Gabor  coefficients  for  h,  can  be  computed  from  the  norma.!  equa¬ 
tions: 

(fl,f,}ai  +  (f2,f,)Q2  +  ---+(fnyf,}Qn  =  (h, /,)  (14) 

for  i  =  1,2, . . . ,  n.  In  some  situations,  such  as  when  a  finite  observation  interval  is  used  or 
when  frequency  indices  higher  than  some  number  are  not  allowed,  it  makes  sense  to  consider 
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the  projection  of  the  function  onto  some  subset  of  the  Gabor  basis  functions.  Moreover, 
when  h  €  L2{I)  but  h  ^  G{I),  the  Projection  Theorem  still  holds.  Thus,  ti'is  method  gives 
the  optimal  vector  in  the  desired  subspace  in  the  sense  of  minimum  distance,  whereas  the 
method  using  biorthogonal  function  will  yield  a  suboptimal  vector.  Thus,  the  Hilbert  space 
method  would  be  preferred  when  h  £  G(I)  is  corrupted  by  noise.  This  will  be  discussed 
in  further  detail  later.  When  all  the  Gabor  coefficients  of  a  function  h  £  G{I)  are  desired, 
the  normal  equations  are  still  valid  but  now  an  inf  ite  system  of  linear  equations  has  to 
be  solved.  The  infinite  system  of  equations  can  be  solved  iteratively  by  solving  the  finite 
set  of  equations  and  by  increasing  the  subspace  onto  which  the  function  is  projected  by  one 
dimension  on  each  iteration  as  the  following  lemma  demonstrates. 

Lemma.  Let  h  £  G(I).  Let  -  be  an  ordering  of  the  Gabor  functions.  Let  fi„  be 

the  projection  of  h  onto  the  subspace  generated  by  {/i, /2,  •  •  • , /n}-  Then  h„  -*  h. 

Proof.  It  is  easy  to  check  that 

E  Ki-  (15) 

i=n+l 

where  a,  is  the  coefficient  of  e,  in  the  orthonormal  representation  of  h.  Since  {01,02'  •  •  •}  € 
I2,  the  right  hand  side  of  (15)  tends  to  zero  as  n  — ♦  0,  which  completes  the  proof.  ■ 

Although  this  theorem  guarantees  that  the  finite  dimensional  solution  k„  given  by 

hn(t)  =  ZA"/,(t) 

1=1 

converges  to  h  in  G(/),  it  does  not  guarantee  that  the  individual  Gabor  coefficients  A" 
converge  to  the  Gabor  coefficients  of  h.  Theorems  and  conditions  for  the  convergence  of 
the  solutions  of  the  finite  systems  of  linear  equations  to  the  solution  of  the  infinite  system 
of  linear  equations  can  be  found  in  [8]. 
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4.2.5  The  Gabor  Coefficients  via  Maximum  Likelihood  Estimation 

The  method  involving  the  normal  equations  can  be  arrived  at  from  a  random  processes 
approach  and  this  will  further  extend  the  optimal  properties  of  this  approach.  Consider  a 
signal  s(t)  =  s<({C„,„})  with  the  Gabor  representation  (2)  corrupted  by  additive  complex 
white  Gaussian  noise  n{t)  with  zero  mean  and  autocorrelation  function  £[7j(s)n*(<)]  = 
NqSit-  s).  Thus,  the  received  signal  r{t)  is  r{t)  =  s{t)  +  n{t).  Given  the  Gabor  coefficients, 
the  likelihood  of  the  received  waveform  is 

€  /IKCmn;  rn,n  e  2}]  =  /lexp[fi({C„»„))/2.Vo]  (16) 

where 

Qi{Cmn})  =  2^y^sU{Cmn})drt^  -  j^\Si{{Cmn])\^dt.  (17) 

and  v4  is  a  positive  constant  which  does  not  depend  on  {Cmn}-  Setting 

Zmn  =  ^  9*{t  -  n)exp(-j2mx<)drt,  (18) 

we  have  that 

00  00 

I  S*f{{CTnn})drt  =  ^  .nn^rnn-  (19) 

m=— oon=— oo 

In  light  of  (16)-(19),  it  follows  that  the  statistics  {Zmn',  m,n  6  2}  are  jointly  sufficient 
statistics  for  {Cmn;  rn,n  ^  2}.  The  second  term  on  the  rigM  hand  side  of  (17)  is  easily 
shown  to  be 

f  \sti{Cmn})\^dt  =  X!  H  X]  ^klUkl,  fmn)Cmn-  (20) 

•’1  k=—ool—-oom=—oon=-oo 

Order  the  coefficients  Cmn  in  a  vector  C,  for  example  using  Figure  1,  and  similarly  order 
the  statistics  Zmn  in  a  vector  Z.  Corresponding  to  this  ordering,  define  the  signal  cross 
correlation  matrix  R  whose  entries  contain  the  inner  products  of  the  form  {fkhfmn)-  Now 
(17)  can  be  rewritten  as 

fi(C)  =  2Jf{C^Z}  -  C^RC  (21) 
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where  is  the  complex-conjugate  transpose  of  C.  The  vector  C  which  maximizes  (21) 
is  the  maximum  likelihood  estimate  of  C  and  is  given  by  the  solution  of  the  infinite  system 
of  linear  equations 

RC  =  Z,  (22) 

which  for  R  invertible,  has  solution  C  =  R“^Z.  This  is  exactly  the  result  obtained  from 
the  Hilbert  space  approach  and  (22)  are  the  normal  equations.  If  instead  of  considering  all 
the  Gabor  coefficients  only  a  finite  number  are  permitted  to  be  non-zero,  then  the  above 
derivation  can  be  used  to  find  the  MLE  of  C  which  will  result  in  the  finite  system  of  normal 
equations  in  (14).  This  corresponds  to  the  projection  onto  the  finite  dimensional  subspace 
of  Gabor  functions.  When  R  is  invertible  in  the  infinite  system  of  linear  equations,  C  is  an 
unbiased  estimate  of  C,  as  is  demonstrated  below.  Since  E[Zmn]  is  given  by 

{OO  OO  I 

Ckig{t  ~  l)exp{j2kirt)  +  n{t)\  dt 
lt=— oo/=— OO  J 

OO  OO  - 

=  ^  ^  Cjfi  I  g*{t  -  n)exp{—j2mirt)g{t  -  l)ex'p{j2kzt)dt, 

fc=-oo/=-oo 

it  follows  that  E[Z]  =  RC.  Since  £[C]  =  R“^£[Z],  £[C]  =  C.  As  done  with  the  Projection 
Theorem,  it  can  be  shown  that  the  finite  dimensional  approximations  3”(<)  converge  to  5(f) 
in  the  mean-square  sense,  but  C”  corresponding  to  the  approximations  need  not  converge 
in  expectation  to  the  true  Gabor  coefficients  C.  However,  if  the  signal  is  such  that  only 
a  finite  set  of  coefficients  are  nonzero,  then  the  finite  solution  of  the  normal  equations  is 
unbiased,  as  well. 

The  distribution  in  (16)  is  of  exponential  family  form  and  is  of  full  rank  because  of 
the  linear  independence  of  the  Gabor  functions.  Since  C  is  a  function  of  the  sufficient 
statistics  and  is  unbiased,  it  follows  from  [9,  Theorem  2.1.2]  that  each  element  in  C  is  a 
uniformly  minimum  variance  unbiased  estimate  of  the  corresponding  element  in  C.  Thus, 
in  the  presence  of  white  Gaussian  noise,  the  Gabor  coefficients  computed  by  the  projection 


103 


method  will  have  smaller  variances  than  those  computed  with  the  biorthogonal  function. 
Note  that  the  procedure  above  is  easily  modified  when  the  noise  is  non-white  Gaussian  noise, 
in  which  case  the  processing  is  similar  to  the  one  given  by  the  Gauss-Markov  Theorem  [10]. 
The  optimality  properties  of  this  method  still  hold. 

The  philosophical  difference  between  the  approach  using  the  biorthogonal  function  and 
the  approach  using  the  normal  equations  can  be  clarified  through  the  analogy  with  inter- 
symbol  interference  or  a  multiuser  channel.  If  a  single  data  symbol  is  desired  in  the  ISI  case 
or  a  single-user  receiver  in  the  multiuser  channel,  then  one  approach  is  to  process  the  data 
so  as  to  null  out  the  interfering  symbols  or  users.  The  other  approach  is  to  simultaneously 
estimate  aU  the  symbols  or  users  and  in  essence  “subtract  out”  the  undesired  ones  from 
the  original  signal.  When  the  interference  has  a  structure  that  is  known  a  priori,  then  this 
can  be  exploited  in  the  simultaneous  estimation  and  the  second  approach  should  be  better. 
For  signals  with  the  Gabor  representation,  filtering  with  the  biorthogonal  function  nulls 
out  all  the  basis  functions  except  the  one  of  interest,  but  this  may  amplify  the  noise  at  the 
same  time.  On  the  other  hand,  the  solution  of  the  normal  equations  is  an  implementation 
of  simultaneous  estimation  of  the  Gabor  coefficients  and  deals  with  the  interference,  the 
desired  signal,  and  the  noise  in  a  unified  approach. 

4.2.6  The  Distribution  of  the  Gabor  Coefficients 

In  order  to  determine  the  performance  of  the  two  schemes  in  estimation  and  detection 
problems,  it  is  necessary  to  determine  the  distribution  of  the  coefficients  when  the  schemes 
are  applied  to  signals  embedded  in  zero-mean  white  Gaussian  noise.  As  both  approaches 
process  the  data  in  a  linear  fashion,  the  sequence  of  estimated  Gabor  coefficients  for  white 
Gaussian  noise  calculated  by  either  method  is  itself  a  Gaussian  random  process.  Thus,  it 
is  sufficient  to  determine  the  first  and  second  order  moments  of  the  process.  Since  both 
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schemes  yield  unbiased  estimates  of  the  Gabor  coefficients  for  a  signal  with  the  Gabor 
representation  embedded  in  zero  mean  noise,  the  means  of  the  estimated  coefficients  are 
the  true  Gabor  coefficients. 


The  second  order  moments  are  as  follows.  Let  n{t)  represent  the  noise  with  autocorre¬ 
lation  function  £[n(s)n*(t)]  =  NQ6{t  -  s).  Then,  the  second  order  moments  for  the  method 
using  the  biorthogonal  function  is  easily  seen  to  be 

00 

E[{Cki-Cki){C;,n-C*mn)\  =  NQ  /  -  l^t  -  n) -  k)t]dt 

—  OO 
OO 

=  iVo  /  7*(07(^  —  (n  -  /))exp[j27r(m  -  k)i]dt. 

—  OO 

In  particular,  the  variance  of  each  coefficient  is 

OO 

Var[Cmn]  =  iVo  J  (24) 

—00 


When  the  maximum  likelihood  estimates  for  the  Gabor  coefficients  are  used,  the  esti¬ 
mates  are  given  by  C  =  R“^Z  as  given  in  the  previous  section.  It  is  easy  to  check  that 
under  the  white  Gaussian  noise  assumption  the  vector  of  sufficient  statistics  Z  has  covari¬ 
ance  matrix  given  by  £[ZZ^]  =  NqK.  Thus,  the  covariance  matrix  of  the  coefficients 
is 

£[(C  -  C)(C  -  C)^]  =  R-^NoRR-^  =  NqR-\ 

4.2.7  Practical  Computation  of  the  Gabor  Coefficients 

In  order  to  compute  the  Gabor  coefficients  by  either  method  described  above,  the 
integrals  in  (8)  or  (18)  are  computed.  In  real  applications,  the  integrals  would  be  computed 
by  approximating  them  with  finite  sums.  If  the  biorthogonal  function  has  infinite  support. 
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it  is  necessary  to  truncate  it  to  an  interval,  say  (-/,/]•  Then,  the  integral  is  approximated 


Cmn  «  J  ^  S  '>'(*  ■*■  7)  ^  l) 

where  L  samples  are  taken  per  second.  The  right  hand  side  of  (25)  can  be  computed  effi¬ 
ciently  using  the  Fast  Fourier  Transform  (FFT).  Similarly,  truncating  the  window  function 
if  necessary,  the  sufficient  statistics  in  (18)  are  approximated  by 

Zmn  «  J  S  +  7)  7) 


which  can  also  be  computed  with  the  FFT.  The  only  difference  between  (25)  and  (26)  is  the 
use  of  the  biorthogonal  function  in  (25),  while  (26)  uses  the  window  function  or  matched 
filter. 


4.2.8  The  Extension  to  Real  Signals 

Both  of  the  methods  described  are  easily  extended  to  real  signals  of  the  following  form: 

00  00 

s(t)  =  ^  Amngit  ~  n)cos{27rmt  -  emn)-  (27) 

m=l  n=— 00 

If  both  the  window  function  and  the  biorthogonal  function  are  real  valued,  it  can  be  checked 
that  the  Gabor  coefficients  {Cmn}  of  a  real  signal  such  as  the  one  in  (27)  will  satisfy 
C-m,n  =  Cmn-  ^  is  assumed  this  is  so.  By  expanding  the  cosine  term  into  complex 
exponentials,  it  is  easily  seen  that 

Aqji  COs(0On)  ~  ^On 

Amn  COs(^n»n)  —  Cmn  +  C^m,n  —  2^{Cmn} 


■4mn  sin(dnin)  ~  (C'mn  “  C—m,n)lj  —  2Q{Cnin} 


—  4\Cmn\“ 

mn  )  _  0{C  mn  } 

laint/mn)  -  -7:;  ^  - r- 

'^mn  “ m.n 
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These  relationships  hold  for  both  methods  discussed  for  complex  signals  operating  on  the 
real  signal  in  (27)  in  the  absence  of  noise. 


For  both  methods,  care  must  be  taken  when  the  signal  (27)  is  corrupted  by  additive 
white  Gaussian  noise.  It  no  longer  makes  sense  to  consider  complex  Gaussian  noise  since 
the  imaginary  part  can  be  removed  without  loss  simply  by  taking  the  real  part  of  the  signal. 
Thus,  the  noise  is  assumed  to  be  real  without  loss  of  generality.  When  the  biorthogonal 
function  is  used,  this  assumption  does  not  affect  the  computation  of  the  coefficients.  It 
does,  however,  affect  the  distribution  of  the  estimates  of  the  coefficients.  Now,  in  addition 
to  the  second  order  moments  in  (23),  the  following  moments  are  required  to  fully  specify 
the  distribution: 

oo 

^K^kl  -  Cki)(Cmn  -  Cmn)]  =  ^0  J  ^(i  “  0l'(<  -  ")  exp[;27r(m  +  k)t]dt.  (28) 

— OO 

The  distribution  of  the  real  and  imaginary  parts  of  the  Gabor  coefficients  can  be  determined 
from  the  moments  in  (23)  and  (28).  See  [3,4]  for  an  e.xample  of  this  procedure. 


For  the  maximum  likelihood  method,  the  complex  sufficient  statistics  {Zmnj  for  the 
complex  case  are  still  jointly  sufficient  for  the  real  signal  in  noise  case.  Equivalently,  the 
statistics  {3fi(Zmn),  ^(^mn)}  are  a  set  of  real  statistics  which  are  jointly  sufficient.  Now, 
however,  the  signal  cross  correlation  matrix  R  must  contain  the  real  inner  products 


J^9i^  -  l)g{t  -  n)cos(2xA:f)cos(2;rm()df, 
J  g{t  —  l]g{t  —  n)cos(27r4:<)sin(27rmi)<ft, 
-  l)g{t  -  n)sin(2xl:/)sin(27rmf)dt. 


These  correspond  to  the  correlation  between  and  J?{Z„,„},  between  Jf{Zi/}  and 

^{Zmn},  and  between  9{Zfc/}  and  0{Z  mn}i  respectively.  The  procedure  for  computing 
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the  maximum  likelihood  estimates  proceeds  as  before  but  with  the  new  matrix  and  suffi¬ 
cient  statistics.  As  before,  the  distribution  is  completely  determined  by  the  signal  cross¬ 
correlation  matrix  R.  An  example  of  this  procedure  is  given  in  Section  4  for  the  exponential 
window  function. 


4.3  The  Gaussian  Window  Function 


The  original  window  function  that  Gabor  considered  in  his  representation  is  the  Gaus¬ 
sian  window  function  given  by 

jr-].  (29) 

The  parameter  controls  the  width  of  the  window  and  the  other  constants  were  chosen 
so  that  /  \g{t)\^dt  =  1.  The  Gaussian  window  function  has  the  property  that  the  transform 
5(/)  of  the  translated  and  sinusoidally  modulated  window  s(t), 

/  2  \ 

^(0  =  \j2j  exp[-7r(t  -  to)‘/’'‘]exp[j(27r/o<  +  <^)]  (30) 

has  the  same  form  as  s(t),  namely, 

S{f)  =  (2r2)‘/^  exp(-7rr*(/  -  /o)']exp{-j[27rfo(/  -  fo)  -  <!>]}■ 

Let  At  be  the  root  mean  square  duration  of  a  signal  x{t)  and  let  A/  be  the  root  mean 
square  duration  of  the  Fourier  Transform  of  x{t).  The  celebrated  uncertainty  principle 
requires  that  AtA/  >  ff.  Equality  is  achieved  for  the  sinusoidally  modulated  Gaussian 
pulse  in  (30).  This  property  implies  that  this  signal  has  the  maximum  resolution  in  the 
time/frequency  space.  This  was  the  motivation  for  the  choice  of  the  Gaussian  window 
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function  in  the  Gabor  representation.  The  Gaussian  window  function  was  later  used  in  the 
two-dimensional  setting  of  vision  analysis  and  image  representation  [15,16].  Throughout 
the  literature,  the  biorthogonal  function  of  the  Gaussian  window  in  one  dimension  (or  two 
for  images)  has  been  used  exclusively  to  compute  the  Gabor  coefficients. 

The  biorthogonal  function  for  the  Gaussian  window  was  found  to  be  [1,2] 


7g(0  =  (2r2)-l/'‘ '''■exp(rtV)  ^  (-l)”exp 
^  '  nX/r-l/2 


where  A'o  =  1.85407468...  is  the  complete  elliptic  integral  for  the  modulus  \I\J2.  The 
window  function  and  biorthogonal  function  are  shown  in  Figure  2  for  =  1,  and  in  Figure 
3  for  =  0.5.  It  is  clear  that  for  smaller  values  of  r  for  which  the  Gaussian  pulse  is  very 
narrow,  the  biorthogonal  function  concentrates  its  energy  in  areas  where  the  Gaussian  pulse 
is  smallest  in  order  to  compute  the  coefficients.  Therefore,  in  the  presence  of  noise  most 
of  the  signal  energy  is  ignored  and  the  noise  is  amplified.  This  is  in  contrast  to  matched 
filtering  where  the  filter  amplifies  portions  with  large  signal  values  and  attenuates  portions 
with  small  signal  values. 

As  pointed  out  in  [7],  7g(0  is  not  square  integrable.  It  follows  from  (24)  that  if  the 
biorthogonal  function  is  used  in  the  presence  of  white  noise,  the  variance  of  the  resulting 
Gabor  coefficients  would  be  infinite.  However,  in  any  real  application  the  function  would 
have  to  be  truncated  to  be  realizable,  and  since  the  biorthogonal  function  is  bounded,  the 
variance  of  the  Gabor  coefficients  would  be  finite.  Using  a  truncated  biorthogonal  function 
results  in  biased  estimates  of  the  coefficients.  Thus,  a  balance  must  be  struck  between  the 
estimation  bias  and  the  estimation  variance  when  truncating  the  biorthogonal  function. 
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(a) 


(b) 


Figure  3.  (a)  The  Gaussian  window  function  gci^)  and  (b)  the  biorthogonal  window 
function  7g(0  for  =  0.5. 


Ill 


In  order  to  use  the  projection  method,  the  signal  cross  correlations  are  required: 
oo 

{fkhfk+m,l+n)=  j  gcii  -  1)9g{^  -  i  -  n)exp{j2irTnt)dt 

—  OO 

OO 

=  (-1)'"”  J  gait  -  nl2)gc{i  +  Tj2)e\^{j2::mt)dt 

—  OO 

=  J  v/2exp  +  n/2)2  ^^^^2nmt)dt 

=  (-ir"exp(-i2^)exp(-|^)  (3.) 

=  r(m,n). 

Note  that  the  expression  in  (31)  “almost”  factors  into  a  product  of  two  terms,  one  which 
depends  only  on  the  time  difference  n  and  one  which  depends  only  on  the  frequency  differ¬ 
ence  m.  Had  the  expression  factored,  the  inversion  of  the  signal  cross  correlation  matrix 
R  which  is  required  for  the  maximum  likelihood  estimates  would  split  into  the  product  of 
the  inversions  of  the  two  factors.  If  it  was  known,  for  example,  that  only  even  frequency 
indices  m  can  have  non-zero  Gabor  coefficients,  then  (31)  would  factor  and  would  result  in 
simpler  processing.  This  factoring  would  be  especially  useful  in  a  sequential  computation  of 
the  Gabor  coefficients  because  of  the  decoupling  between  the  delay  index  and  the  frequency 
index.  For  sequential  processing,  there  may  still  be  the  promise  of  reduced  computation  due 
to  the  unusual  form  of  (31),  and  this  could  be  an  area  of  study  for  practical  implementation. 

Although  the  linear  independence  of  the  Gabor  functions  insures  linear  independence 
of  the  columns  of  the  signal  correlation  matrix,  the  doubly  infinite  matrix  need  not  be 
invertible.  It  must  be  positive  semidefinite  though.  Numerical  difficulties  were  encountered 
in  the  inversion  of  finite  signal  cross- correlation  matrices  with  large  time  and  frequency 
dimensions,  and  it  is  hypothesized  that  the  infinite  matrix  is  not  invertible.  This  does 
not  mean  that  a  unique  solution  does  not  e.xist  for  the  normal  equations.  However,  even 
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when  a  unique  solution  does  exist,  it  would  need  to  be  computed  iteratively  by  solving  the 
finite  system  of  equations  for  progressively  higher  order  systems  [8].  This  is  assumed  to  be 
impractical  and  so  estimates  can  be  obtained  by  using  only  the  solution  to  a  single  finite 
system  of  normal  equations. 

Figure  4  shows  the  real  part  of  a  signal  with  the  Gabor  representation  using  the  Gaus¬ 
sian  window  with  =  1,  and,  using  the  same  coefficients,  the  real  part  of  the  signal  with 
=  0.5.  The  signal  has  three  nonzero  coefficients  and  there  is  no  noise  present.  Using 
the  biorthogonal  function  restricted  to  the  interval  (—10, 10],  the  coefficients  were  computed 
by  using  the  technique  discussed  in  Section  4.2.7.  Z,  =  64  samples  are  taken  per  second. 
Similarly,  the  sufficient  statistics  were  computed  by  truncating  the  Gaussian  window  to  the 
interval  [-10  :  10].  However,  little  is  lost  in  the  truncation  of  the  Gaussian  window  since  it 
decays  exponentially  to  zero,  unlike  the  biorthogonal  function. 

In  order  to  compute  the  maximum  likelihood  estimates,  the  vector  of  sufficient  statistics 
has  to  be  multiplied  by  the  inverse  of  the  signal  cross- correlation  matrix,  which  is  doubly 
infinite.  Instead,  a  two-dimensional  filter  is  designed  to  filter  the  sufficient  statistics.  This 
filter  is  obtained  as  follows.  For  a  fixed  maximum  frequency  index  difference  A/  —  1  and 
delay  index  difference  -  1,  an  MN  x  M N  matrix  R  is  defined  according  to 

=  r{m-k,n-  1)  (32) 

for  m  =  0, 1, . . . ,  A/  -  1,  n  =  0, 1, . .. ,  -  1,  where  r(m, n)  is  defined  in  (31).  From  the  form 

of  (32),  R  is  a  Toeplitz-block,  block-Toeplitz  matrix.  After  R~*  is  computed,  the  middle 
row  of  the  inverse  is  extracted  because  v^Z  is  the  estimate  of  the  middle  coefficient  of 
the  MN  X  MN  grid.  The  elements  of  v  are  ordered  into  an  M  x  N  two-dimensional  filter 
which  is  used  to  filter  the  two-dimensional  array  of  sufficient  statistics.  Thus,  the  estimate 
of  each  Gabor  coefficient  is  obtained  from  the  MN  neighboring  sufficient  statistics.  Since 
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Figure  4.  The  reaJ  part  of  a  noise-free  signal  with  the  Gabor  rtj^resentation  using  the 
Gaussian  window  function  with  three  non-zero  coefficients  for  (a)  =  1  and  (b)  t=  0.5. 

The  locations  and  values  of  the  coefficients  are  indicated  in  the  figures.  The  sampling  rate 
was  L  =  64  samples  per  second. 
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r(m,  n)  decays  exponentisJly  with  m  and  n,  reasonable  values  of  M  and  N  can  be  chosen 
to  yield  nearly  optimal  results.  This  procedure  was  implemented  with  M  =  N  =  5. 

The  magnitudes  of  the  coefficients  computed  by  the  two  methods  for  the  noise-free 
signals  in  Figure  4  are  shown  in  Figure  5  for  =  1  and  in  Figure  6  for  =  0.5.  Note  that 
the  sidelobes  that  result  from  filtering  the  sufficient  statistics  could  be  removed  by  using 
a  larger  filter.  For  example,  when  r  =  0.5,  the  sidelobes  in  the  time  index  disappear  but 
the  sidelobes  in  the  frequency  index  appear.  This  is  because  the  sufficient  statistics  are 
now  more  correlated  in  the  frequency  index  as  seen  from  (31).  A  rectangular  filter  which  is 
longer  in  the  frequency  axis  would  solve  this  problem. 

Complex  zero-mean  Gaussian  white  noise  of  variance  a*  =  2  per  sample  (<7”  =  1  for 
each  the  real  part  and  the  imaginary  part,  which  are  uncorrelated)  was  added  to  the  signals 
in  Figure  4  and  the  real  part  of  these  signals  are  shown  in  Figure  7.  The  resulting  coefficients 
are  depicted  in  Figure  8  and  Figure  9.  As  seen  in  Figure  8,  the  results  for  the  maximum 
likelihood  method  are  only  slightly  better  than  those  for  the  biorthogonal  function  method 
for  =  1.  However,  the  maximum  likelihood  estimates  for  t“  =  0.5  shown  in  Figure 
9  clearly  have  better  noise  properties  than  the  other  estimates.  Heuristically,  because  of 
the  mismatching  involved  in  using  the  biorthogonal  function,  it  seems  reasonable  that  the 
maximum  likelihood  method  would  be  better  when  the  noise  is  large  or  when  t-  is  small. 
When  the  noise  is  negligible  and  r*  is  moderate,  say  >  1,  there  may  be  little  to  gain  by 
using  the  maximum  likelihood  method,  which  is  slightly  more  complicated  to  implement. 
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(a) 


(b) 


Figure  5.  The  magnitudes  of  the  coefficients  for  the  signal  in  Figure  4(a)  using  (a)  the 
biorthogonal  function  method  and  (b)  the  maximum  likelihood  method. 
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Figure  7.  The  real  part  of  the  signals  in  Figure  4  for  (a)  =  1  and  (b)  =  0.5  with 
additive  complex  white  Gaussian  noise  with  variance  cr^  =  2  per  complex  sample.  The 
sampling  rate  was  L  =  64  samples  per  second. 


118 


0  \0 


(a) 


0  \0 


(b) 


Figure  8.  The  magnitudes  of  the  coefficients  for  the  signal  in  Figure  7(a)  using  (a)  the 
biorthogonal  function  method  and  (b)  the  maximum  likelihood  method. 
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0  \0 


(a) 


0  \0 


(b) 


Figure  9.  The  magnitudes  of  the  coefficients  for  the  signal  in  Figure  7(b)  using  (a)  the 
biorthogonaJ  function  method  and  (b)  the  maximum  likelihood  method. 
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4.4  The  Exponential  Window  Function 

4.4.1  Introduction 

While  the  Gaussian  window  function  is  well  suited  for  signals  which  are  made  up  of 
pulses  with  continuous  and  unimodal  envelopes,  it  can  not  model  well  discontinuities  in  a 
signal,  such  as  the  type  encountered  in  transient  signal  analysis.  For  this  reason,  the  expo¬ 
nential  window  was  investigated  in  [4].  Exponentially  decaying  sinusoids  have  been  previ¬ 
ously  used  to  model  transient  signals,  especially  in  relation  to  Prony’s  method  [11,14,18]. 
The  exponential  window  function  5e(0  is  given  by 

5(0  =  v^exp(-A0u(0.  (33) 

where  A  is  a  parameter  and  u(t)  is  the  unit  step  function.  The  leading  constant  in  (33)  is 
chosen  so  that  /|5(0I*^^  =  1- 

4.4.2  The  Biorthogonal  Function  Method 

The  biorthogonail  function  for  the  exponential  window  is  [3,4] 

7e(t)  =  ^^^^[-u(i  -b  1)  -h  2u(0  -  u(t  -  1)]. 

This  function  is  zero  outside  the  interval  [-1:1]  which  simplifies  the  computation  of  the 
Gabor  coefficients: 

1 

Cmn  —  /  s{t  ■‘r  n)7e{t)exY>{—j2~mt)dt.  (34) 

-1 

The  window  function  and  the  biorthogonal  function  for  A  =  0.5  are  shown  in  Figure  10, 
and  those  for  A  =  4  are  shown  in  Figure  11. 

A  further  simplification  is  [3,4] 

Cmn  —  ^mn  ~  ^^P(~  —  \ 
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Figure  10.  (a)  The  exponential  window  function  aJid  (b)  the  biorthogonal  function 
7e(t)  for  A  =  0.5. 
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Figure  11.  (a)  The  exponential  window  function  gdt)  and  (b)  the  biorthogonal  function 
^e{i)  for  A  =  4. 
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where 


I 

Dmn  =  j +  n)Gxp{Xt)exp{-j2irmt)dt. 

In  [4],  the  statistics  of  the  Gabor  coefficients  in  real  white  Gaussian  noise  with  autocorre¬ 
lation  function  NQ6{t  —  a)  are  derived  and  the  results  are  included  here  for  later  reference. 
Suppose  the  NM  coefficients  {CmniO  <  m  <  M  —  1,0  <  n  <  N  -  1}  are  desired,  and  let 
Cmn  =  3?{Cmn}  and  =  ^{C'mn}-  Let  C„  be  the  (2,1/  -  1)  component  vector 

c,  =  |Co"„  c,«„  ■  ■ . cln  ■  ■  •  (35) 


and  let  C  be  the  (2A/  -  component  vector 


C  =  [Cl  (36) 

Since  Cq  ^  is  real,  Cq  „  is  omitted  in  (35).  The  covariance  matrix  for  C  is 

£1CC^]  =  A'olexp(2A)  -exp(-2AJp  ^ 

where  =  0  denotes  the  Kronecker  product,  the  N  x  N  matrix  P  is  the  tridiagonal  Toeplitz 


matrix 


1  -o 


-a  1 


Q  =  [exp(A) -I- exp(-A)]~^ 

and  the  (2M  -  1)  x  {2M  -  1)  matrix  Q  is  given  by 

A  A 

(Q  W  -  4;^2  +  4^2(n^  _  +  4^2  +  k)'^ 

T^(m-\-k)  .  ir(m-k) 

(t^lA/-lW  -  4;^2  +  4^2(^  +  +  4a2  +  4;r2(m  -  A:)2 


Tr{m  +  k) 


K(m  -  k) 


i^)k,M  1+m  ^a2  4-  47r-(m  -t-  k)-  4A-  +  47r“(m  -  k)- 


0  <  Tn,k  <  M  —  I 

0<k  <M  -I 
1  <  m  <  M  —  I 
0  <  k  <  A/  -  1 
1  <  m  <  A/  -  1 


(Q)A/-i+m,A/-l-(.fc  -  4A2  4.4;r2(m  -  A-)'^  4A-’ -h  4ir^(m  +  A:)^  '  i  L 


4.4.3  The  Maximum  Likelihood  Method  for  Finite  Observation  Intervals 
Consider  a  finite  observation  interval  I  =  [0,  and  let  the  allowable  frequency  indices 
m  satisfy  0  <  m  <  A/  -  1.  The  received  signal  r{t)  consists  of  a  real  signal  s{t)  given  by 
N-1  M-lN-1 

^(0=  51  -  n)  +  Y  -  n)cos{2nmt  +  9mn)  (41) 

n=0  m=l  n=0 

corrupted  by  additive  real,  white  Gaussian  noise  n{t)  with  £'[n(<)n(t  +  r)]  =  Ao^(r).  Below, 
the  method  described  briefly  in  Section  2.6  for  handling  the  real  signal  case  is  developed 
in  detail  using  the  example  of  the  exponential  window  function.  A  different  derivation  is 
possible  which  uses  complex  sufficient  statistics  and  matrices,  but  the  approach  taken  here 
is  to  use  all  real  signal  processing. 

The  signal  in  (41)  can  be  rewritten  in  the  form 
N-l  A/-1A'-1 

s(0  =  Y  ^O.n5«(<-”)+  Y  51 

n=0  m=l  n=0 

where 

Uo,n  =  ^0,n 

Umn  —  Amn  cos(0  mn  )  (43) 

—  Anin  sin(^rnn  )• 

For  a  given  set  of  coefficients  the  likelihood  of  the  received  waveform  is 

5  /}|{^mni  I'mn}]  =  ^  C.xp  [n(  {i/mm  1  mn  }  )/ A'o]  (44) 

where 

fi( { Vmn  1  Fmn } )  =  2  J^Si{{L  mnt  ^  mn  ])drt  -  |st({t^mn  '  ^mn  (45) 

and  K  is  a  positive  constant  that  does  not  depend  on  {b^mn.l'mn)-  Set 

/mn  =  /  ge{t  -  n)cos{2Trmt)drt 

\  (46) 

Qmn  =  J^ge(t  -  ■n)s\i\{2Tnnt)drt. 
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Then 


N-l 

I  ^t({t^mn  1  ^  ^  f'^O.n/o.n  "t"  E  E  [Umn^mn  +  ^TnnQmn]-  (47) 

n=0  m=l  n=0 

It  follows  from  (44)-(47)  that  {/mn;0  <  m  <  N  ~  1,0  <  n  <  N  —  1}  and  {Qmn\  1  S  ^  < 
iV  —  l,0<n<iV  —  1}  are  jointly  sufficient  statistics  for  the  distribution  (44).  Define  the 
following  signal  cross-correlations; 

r^^[(*:,/),(n,m)]  =  fjgt{t-l)cos{2T:kt)ge{t-n)cosi2T:mt)dt,  ^ 

0  <  k  <  M -1 

m)]  =  Jj ge{t  —  l)cos(2Trkt)gg{t  —  n)sin(2TTTnt)dt,  1  <  m  <  A/ -  1 

0  <  l,n  <  N  -  I 

1  <  it  <  A/  -  1 

r^^iikJUn  ,m)]  =  fj ge{t  -  l)sin{2Trkt)ge{t  —  n)cos{2nTnt)dt,  0  <  m  <  A/ -  1 

0  <  l,n  <  N  -  I 

rQQ[(k,l),(n,m)]  =  fj  gdl  -  i)sini27rkt)ge(t  -  n)sini27rTnt)dt,  ^  / 

(48) 

Now,  the  second  term  on  the  right  hand  side  of  (45)  can  be  written  as 
.  .tZ-l  A'-l  ,t/-i  N-l 

\st{{imn^V^n})\^dt=Y^  E  E  T.  [{k,lUn,m)]Umn 

h~0  1=0  m=0  n=0 

+2  E  E  E  E  (49) 

1:=1  1=0  m=0  n=0 
Af-1  iV-1  Af-1  N-l 

+  E  E  E  E  ^kir^^i(^'Jh(n,m)]Vmn. 

k=l  1=0  m=l  rj=0 

Define  the  (2 A/  -  1)  component  real  vector  W„  by 

W„  =  UQ  ■f^  U\^n  — l,n  ^l,n  •  •  •  KU  — l,n  (^0) 

and  the  (2A/  -  l)iV  component  real  vector  W  by 

w=  [w^  (51) 

Similarly,  define  the  (2A/  -  1)  component  vector  T„  of  sufficient  statistics  by 

—  Jfo,n  fl ,Ti  ■  ■  ■  I.M  —  l,nQl,n'''Q.\f  —  \,n  (^2) 
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and  the  (2A/  -  1)N  component  vector  T  by 


T=  [tI 


T  rj.T 


Arrange  the  (2Af  -  1)N  x  (2M  -  1)N  signal  cross-correlation  matrix  R  according  to 


Wl(2M-l)+M+k,n('iM-l)+m+l  =  [{kJ),{Tn,n)], 

(54) 

{^)l(2M-'i)+M+kM'iM-\)+M+m  =  ^(KU<  • 

Now,  (44)  can  be  expressed  as 

il{{Umn,Vmn})  =  2W^T  -  W^RW.  (55) 

The  maximum  likelihood  estimate  of  W,  Wj^jle>  maximizes  (55)  and  is  the  solution  of  the 


set  of  linear  equations 


RW  =  T, 


which,  for  R  invertible,  has  solution 


Wmle  =  R"'T. 


It  is  straightforward  to  show  that 


E  [wmle]  =  w 

E  [(Wmle  -  w)  (Wmle  -  w)^]  =  AoR-^ 


The  procedure  outlined  above  is  valid  for  any  window  function.  When  the  exponential 
window  function  is  used  with  the  observation  interval  I  —  [0,  A'],  the  signal  cross-correlations 


in  (48)  are  as  follows: 


r^^[{k,l),im,n)]  =  Xri{l,n)  ^ 


L4A2  4.  47r2(j7i  _  ky2 

4A2  +  47r2(m  4-  fc)2 

2x(m  —  k) 

27r(fc  -h  m) 

.4A2 -f  4;r-(m  —  fc)2  ' 

4A2  -f-  47r2(jn  4-  k)'^ , 

2x(fc  —  m) 

2‘x{k  -I-  m) 

.4A2 -)- 4jr2{Tn  -  fc)2  * 

4A2  -}-  47r2(m  -|-  k)^. 

)[  .  1  ....  . 

1 

where  ri(/,  n)  is  given  by 


with  I  V  y  =  max(i,y).  Because  the  signal  cross-correlations  factor  into  a  product  of 
two  terms,  one  which  depends  only  on  the  delays  I  and  n  and  one  which  depends  only  on 
the  frequencies  k  and  m,  the  signal  cross-correlation  matrix  R  factors  into  the  Kronecker 
product 

R  =  Rj  ®  R2  (58) 

where  Ri  is  the  N  x  N  matrix  with  entries 


(R-1  )u  =  exp{A(/  -f-  n)}  [exp{-2A(/  V  n)}  -  exp{-2AA'}] 


and  where  R2  is  the  (2M  -  1)  x  {2M  -  1)  matrix  with  entries 

^  .  2A2  _  2A2 

(  2}mk  =  ^^2  +  4^2(^  _  /.)2  +  4;^2  4.  4;r2(m  -I-  kj^ 


0  <  fc,m  <  A/  -  1 
0  <k  <  M  -  I 


2A7r(m-f-t)  2X-K{m  -  k)  0<k<M-l 

~  ^^2  +  4^2(^  4.  ^.)2  +  4;^2  4.  4^^2(171  -  Jt)'  1  <  m  <  A/ -  1 

^  _  2Ax(m  -f-  k)  ^  2A7r(m  -  k)  0  <  k  <  M  -  1 


_  -h  k)  2ATr(m  —  fc)  0  <  k  <  M  —  1 

(  2)M-l+m,k  ~  ^^2  4.  4;r2(m  -f-  k)-  4A'  -J-  4;r2(m  -  k)- 

2A2  2A2  _  ,  .  w  , 

-  .x2  .  a^2/ - t.\2  a \2  ,  a^2( ^  ,  i.\2  I  <  k,Tn  <  M  1. 


(R-2)A/-l-hm,A/-l-H  -  4;^2  4.  4^2^^  _  /.)2  4^2  +  4^2(^  +  ;.)2 


Note  that  the  matrix  Rt  and  the  matrix  Q  in  (40)  are  identical  to  within  a  scale  factor. 
However,  R2  is  part  of  the  covariance  matrix  of  the  sufficient  statistics,  or,  through  (57), 
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related  to  the  inverse  covariance  matrix  of  the  maximum  likelihood  coeflficient  estimates. 


On  the  other  hand,  Q  is  part  of  the  covariance  matrix  of  the  coefficients  estimated  by  the 
biorthogonal  function. 

The  fact  that  signal  cross  correlation  matrix  factors  into  a  Kronecker  product  simplifies 
the  processing  to  compute  Wml£  =  R“^T  since 

R-^  =  (Ri  0R2)~^  =  (60) 

4.4.4  Examples  using  the  Exponential  Window 

A  noise-free  signal  with  three  nonzero  coefficients  using  the  exponential  window  function 
for  X  =  0.5  is  shown  in  Figure  12  along  with  the  same  signal  corrupted  by  additive  zero- 
mean  white  Gaussian  noise  with  a  variance  of  cr-  =  1  per  sample.  The  magnitudes  of  the 
Gabor  coefficients  computed  for  each  of  the  two  methods  for  M  =  A"  =  16  for  the  noise- 
free  case  and  the  noisy  case  are  shown  in  Figure  13  and  Figure  14.  The  sampling  rate  was 
L  =  1024  samples  per  second.  Figure  15  shows  a  signal  and  its  noise  corrupted  version  with 
same  coefficients  as  that  in  Figure  12  but  with  A  =  4.  Figure  16  and  Figure  17  show  the 
corresponding  coefficients.  Overall,  the  quality  of  the  noise  free  estimates  is  nearly  identical. 
For  this  example,  the  overall  quality  of  the  estimates  in  the  noisy  case  is  roughly  the  same 
for  A  =  0.5  but  the  maximum  likelihood  method  is  clearly  superior  for  A  =  4.  This  agrees 
with  the  intuition  that  the  biorthogonal  function  method  performs  poorly  for  large  values 
of  A  in  the  presence  of  noise  because  it  amplifies  the  portion  of  the  signal  where  the  window- 
function  is  smallest,  thus  zmiplifying  the  noise. 

In  order  to  quantitatively  compare  the  performance  of  the  two  schemes,  the  variance 
of  the  estimates  of  Umn  and  V’t„„  in  (43)  are  compared.  From  (37)-(40),  the  estimation 
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(a) 


I  I I 


0  4  8  12  16 

Time  (seconds) 

(b) 


Figure  12.  A  real  signal  with  the  representation  in  (41)  using  the  exponential  window 
function  with  X  =  0.5  (a)  without  added  noise  and  (b)  with  white  Gaussian  noise  with 
variance  cr^  ~  I  added  per  sample.  The  three  non-zero  coefficients  are  shown  in  (a).  The 
s."'T.pling  rate  wa3  L  =  102^  samples  per  second  for  the  processing  but  Z.  =  64  samples  per 
second  was  used  for  this  figure. 
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Figure  13.  The  magnitudes  of  the  coefficients  of  the  signal  in  Figure  12(a)  computed 
using  (a)  the  biorthogonaJ  function  method  and  (b)  the  maximum  likelihood  method.  The 
sampling  rate  was  L  =  1024  samples  per  second. 
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Figure  14.  The  magnitudes  of  the  coefficients  of  the  signal  in  Figure  12(b)  computed 
using  (a)  the  biorthogonaJ  function  method  and  (b)  the  maximum  likelihood  method.  The 
sampling  rate  was  L  =  1024  samples  per  second. 
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(a) 


Figure  15.  A  real  signal  with  the  representation  in  (41)  using  the  exponential  window 
function  with  A  =  4  (a)  without  added  noise  and  (b)  with  white  Gaussian  noise  with 
variance  =  1  added  per  sample.  The  three  non-zero  coefficients  are  shown  in  (a).  The 
sampling  rate  was  L  =  1024  samples  per  second  for  the  processing  but  L  =  64  samples  per 
second  was  used  for  this  figure. 
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Figure  16.  The  magnitudes  of  the  coefficients  of  the  signal  in  Figure  15(a)  computed 
using  (a)  the  biorthogonaJ  function  method  and  (b)  the  maximum  likelihood  method.  The 
sampling  rate  was  L  =  1024  samples  per  second. 
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(a) 
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(b) 

Figure  17.  The  magnitudes  of  the  coefficients  of  the  signal  in  Figure  15(b)  computed 
using  (a)  the  biorthogonal  function  method  and  (b)  the  majcimum  likelihood  method.  The 
sampling  rate  was  L  =  1024  samples  per  second. 
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variance  for  the  biorthogonal  function  method  is 


w_-rrr  ,  _  ^o[exp(2A)  -  exp(-2A)]  f  1  ,  A  ] 

‘  “  2A  UX  4A2  +  I67r2m2 J 

iVo[exp(2A)  -  exp(-2A)) 


Var[V;n„]  = 


2A 


4A2  +  167r2m2 
A 

[4A  4A2  +  167r2m2 


(61) 


The  variances  for  the  maximum  likelihood  estimates  of  Umn  and  V^n  are  found  according  to 
(57)  to  be  the  product  of  Nq  and  the  corresponding  element  in  on  the  main  diagonal. 
The  inverse  was  computed  for  M  =  N  =  16  using  the  simplified  structure  in  (60).  Let 
Pi;{Tn,n)  and  py{m,Ti)  be  the  reduction  in  estimation  variance  for  U^n  and  obtained 
using  the  maximum  likelihood  estimates  instead  of  those  obtained  by  the  BFM,  that  is 


Define  the  following; 


Pu{m,n) 


PV'(m,n) 


Var 

(i^mn)BFM 
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(t^mn)MLE 

\’ar 

(14in)BFM 
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m=0  n=0  m=l  n=0 


These  quantities  were  computed  for  M  =  iV  =  16  for  A  €  [0.1,10]  and  the  results  are 
depicted  in  Figure  18.  This  figure  shows  that  the  maximum  likelihood  estimates  have 
uniformly  lower  variance  than  the  other  estimates  as  guaranteed  by  the  theory.  Note  that 
little  is  gained  on  the  average  using  the  maximum  likelihood  method  for  small  values  of  A, 
whereas  a  reduction  of  at  least  a  few  dB  can  be  expected  for  A  >  2. 
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Figure  18.  The  maximum,  average,  and  minimum  reduction  in  the  estimation  variance 
obtained  by  using  the  maximum  likelihood  method  instead  of  the  biorthogonal  function 
method  to  calculate  the  coefficients.  The  signal  under  consideration  is  a  real  signal  with 
the  Gabor  representation  with  M  =  N  =  16  using  the  exponential  window  function  with 
parameter  A. 
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4.4.5  Detection  for  Finite  Observation  Intervals 

In  [4],  the  following  detection  problem  was  analyzed.  Under  the  null  hypothesis  Hq 
only  white  Gaussian  noise  is  present.  Under  the  alternate  hypothesis  Hi,  the  received  signal 
consists  of  a  transient  signal  together  with  additive  white  Gaussian  noise.  The  transient  is 
given  by  the  representation  in  (41)  where  only  the  coefficients  {/ImniO  <m<A/-l,n6  A’’} 
can  be  non-zero,  where  is  a  subset  of  A'  integers  in  the  range  [0,  N  -  1].  Basically, 
this  detection  problem  amounts  to  deciding  if  a  transient  begins  at  a  specified  time  with 
unknown  frequency  index.  Since  the  values  of  the  non-zero  coefficients  of  the  transient 
signal  are  assumed  to  be  unknown,  a  generalized  likelihood  ratio  test  is  derived  which  uses 
the  coefficients  derived  from  the  BFM  in  [3,4].  The  detector  is  summarized  here.  Reorder 
the  vector  C  in  (36)  into  the  vector  x  so  that 


X 


T 


where  Xi  is  the  (2A/  -  1)A'  component  vector  corresponding  to  the  nonzero  coefficients  in 
the  transient  signed  and  X2  is  the  (2M  -  1){N  -  K)  component  vector  corresponding  to  the 
remaining  coefficients.  Let  A  be  given  by  A  =  £[CC^]  which  is  defined  in  (37)  and  has 
been  reordered  to  correspond  to  the  new  ordering  of  C,  and  let  T  =  A~^.  This  reordering 
admits  the  following  partitioning  of  A 


A  _  All  Ai2 

[A21  A22.  ■ 

The  test  statistic  derived  from  the  generalized  likelihood  ratio  test  Ti  is  ^ven  by  [see  4  for 
details] 

Ti  =  x^A“'x  -  xrA7o'x2.  (62) 


It  is  shown  that  under  Hq,  Ti  is  chi-square  distributed  with  (2.1/  -  1)A’  degrees  of  freedom. 
Under  Hi,  Ti  has  the  noncentral  chi-square  distribution  with  (2,1/  -  1)A’  degrees  of  freedom 
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and  noncentrality  parameter  given  by 

=  uf  (Aii  -  AioA.Jo^Aoi)  ^  ui 

=  uj’riiui 

where  u  is  the  vector  of  the  true  Gabor  coefficients  of  the  noise-free  transient. 


(63) 


The  same  detection  problem  is  now  considered  using  the  maximum  likelihood  approach 
in  order  to  determine  if  the  lower  estimation  variance  of  the  maximum  likelihood  estimates 
translates  into  better  detection  performance  as  well.  Again,  a  generalized  likelihood  ratio 
is  used  to  determine  the  test  statistic  Ti 


To  =  max  21ogP 
n6//} 

-  2  log  P 


{r^,  t  G  [0,A]}  I  } >  f^mn  —  ^mn  —  0  for  n  ^  A 

{rj,  t  €  [0i  A]}  I  Umn  —  =  0  Vm,n 


=  "rr*  max  ^{{Umni^mn})- 

{C/mn,Vmn;  n€Af} 

Again,  reorder  the  coefficient  vector  W  of  (51)  into  a  vector  y  so  that 


(64) 


T  T 

=  [yi  y-i 


where  yi  is  the  (2m  -  l)/v  component  vector  corresponding  to  the  non-zero  coefficients  of 
the  noise  free  transient  and  y2  is  the  (2m  -  1)(7V  -  A')  component  vector  corresponding  to 
the  other  coefficients.  Similarly,  reorder  the  vector  T  of  sufficient  statistics  in  (53)  into  a 
vector  z  =  [zj  zj]^.  Also,  define  a  matrix  S  by  reordering  the  matrix  (l/Ao)R  to  match 
y  and  z,  where  R  is  the  signal-cross  correlation  matrix  in  (54).  The  matrix  S  admits  the 
partition  corresponding  to  y  and  z. 


S  = 


Sii 

.S21 


S12 

S22 


It  follows  from  (55)  and  the  fact  that  y2  =  0  that 


n({f^mn,  Vmn})  =  2yfzi  -  A'oyfSiiyi 


(65) 
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and  that  z\  is  the  vector  of  the  jointly  sufficient  statistics  for  this  detection  problem.  The 
vector  yi  which  maximizes  (65)  for  Sn  invertible  is 


yi  =  (66) 

J'O 

for  which  the  achieved  maximum  is 

max  ,  n({f/'mn,Vmn})  =  Zl  =  A'oyrSiiyi. 

{t^mn»Wnn»  n€A  }  -^0 

t^mn=^mn=0 

Thus,  the  test  statistics  is 

7'2  =  yrSiiyi.  (67) 

Note  that  the  vector  yj  is  not  the  same  as  the  first  partition  of  the  overall  maximum 
likelihood  vector  W^(le  computed  without  restricting  any  coefficients  to  be  zero.  It  is, 
however,  the  projection  onto  the  subspace  determined  by  the  non-zero  coefficients. 

Since  Sn  is  symmetric  and  positive  semi-definite,  by  the  Cholesky  decomposition  there 
exists  a  lower  triangular  matrix  L  such  that  LL^  =  Sn.  In  light  of  (57),  L^yi  is  a  vector 
of  independent  and  identically  distributed  Gaussian  random  variables  with  zero  mean  and 
a  variance  of  one.  It  follows  that  Tt  has  the  chi-square  distribution  with  (2M  -  1)A'  degrees 
of  freedom  under  Hq-  Under  //j,  T-y  is  noncentral  chi-square  with  i2M  —  l)A'  degrees  of 
freedom  and  noncentrality  parameter  given  by 

<5|  =  ufSiiU2.  (68) 


Using  the  approach  in  [4]  another  detector  can  be  developed  which  uses  the  maximum 
likelihood  estimates.  To  apply  the  same  technique  used  to  derive  (62),  let  w  = 
be  the  reordered  vector  of  global  maximum  likelihood  estimates  W^jle  (^^)-  From  (57), 
the  covariance  matrix  E  of  w  is  E  =  S“U  The  new  detection  statistic  T3  is  then  exactly 
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as  in  (62),  with  w  and  E  replacing  x  and  A, 

T3  =  w^E~^w  -  wTeJ2^W2.  (69) 

Using  the  matrix  identity 

Eoo^  —  S22  “  ^21^1/^12  (^®) 

T3  can  be  expressed  in  terms  of  the  signal  cross-correlation  matrix  by 

T3  =  w^Sw  -  Wo  (S22  -  S2iS^/Si2)  W2.  (71) 

There  are  two  things  to  note  about  T3  from  (69).  First,  this  detection  statistic  is  a  function 
of  the  global  maximum  likelihood  estimates,  not  the  local  ones  as  in  To.  Second,  since 
w  =  (l/iVo)S“^z,  T3  is  apparently  a  function  of  all  the  components  of  z,  not  just  the 
sufficient  statistics  which  were  shown  to  be  zi  by  (65). 

Once  again,  r3  has  the  chi-square  distribution  with  (2M  —  l)K  degrees  of  freedom 
under  Hq.  Under  Hi,  T3  is  noncentral  chi-square  with  (2.1/  -  1)A'  degrees  of  freedom  and 
noncentrality  parameter  ^3  given  by 

^3  =  (^11  “  ^12^^/221)  u 

=  ufSiiUi.  (72) 

By  comparing  62  (68)  and  6^  (72),  it  is  apparent  that  To  and  T3  have  identical  distributions 
under  Hq  and  Hi,  and,  therefore,  they  have  the  same  performance.  It  will  now  be  shown 
that  the  two  statistics  are  in  fact  equal. 

Proposition.  T2  =  Tq. 

Proof.  From  (70)  and  using  the  identities  w  =  (l/A'o)S~^z  and  E  =  S~\ 

NqTq  =  Z^Ez  —  [E21Z1  -t-  S22Z2]^  "^22  1^21^1  ^22^2] 
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=  Zi  jZi  +  z^Si2Z2  "t"  zT S21Z1  +  zT ^22^2 

—  z|'Si2SJ2^E21Z1  —  Zi^Si2S22*S22Z2  “  22^22272^  S21Z1  —  Z2S22^ ^2^  ^22*2 

=  Zj  SiiZi  —  Zj  Si2S22^  ^21Zi 

=  zj  ^Eii  -  Ei2E22^E2i)  Zi 
=  zfsf/zi 

=  A^owTSiiWi 

=  NiT2.  U 


The  implication  of  this  result  is  that  the  test  statistic  T3  in  fact  only  uses  the  sufficient 
statistics  and  no  more,  even  though  it  may  appear  otherwise. 

The  value  of  the  second  expression  for  the  detection  statistic  can  be  seen  from  the 
following  scenario.  Suppose  one  is  interested  in  testing  several  alternatives  against  the  nuD 
hypothesis.  Different  alternatives  may  have  one  or  more  non-zero  coefficients  in  common. 
Using  the  test  statistic  T2,  the  local  maximum  likelihood  estimates  (LMLE)  must  be  com¬ 
puted  for  each  alternative,  and  the  coefficients  in  common  to  more  than  one  hypothesis  will 
have  different  LMLE’s.  However,  if  the  statistic  is  used  instead,  the  global  maximum 
likelihood  estimates  are  computed  for  all  the  Gabor  coefficients  at  once.  These  estimates 
are  then  processed  according  to  (69)  for  each  alternate  hypothesis.  This  may  be  more  effi¬ 
cient  and  also  less  ambiguous  about  estimates  of  coefficients  in  common  to  more  than  one 
alternative.  Furthermore,  it  may  be  important  from  a  data  visualization  standpoint  to  have 
estimates  for  all  the  Gabor  coefficients,  and  thus  the  statistic  T3  is  the  natural  choice. 

Note  that  the  noncentrality  parameters  Sf  in  (63),  (68),  and  (72)  are  generalized  signal- 
to-noise  ratios.  They  are  linearly  proportional  to  HuH'  and,  since  Fn  in  (63)  and  Sn  in 
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(68)  and  (72)  are  inversely  proportional  to  Nq,  they  are  inversely  proportional  to  Since 
the  distributions  of  Ti  and  T2  (or  T3)  have  the  same  distribution  under  J/q  and,  aside  from 
the  different  noncentrality  parameters,  they  also  have  the  same  distribution  under  /fj,  it 
follows  that  the  gain  of  one  detection  scheme  over  the  other  is  fully  determined  by  the 
noncentrality  parameters.  Accordingly,  define  the  detection  gain  G(u)  in  signal-to-noise 


to  have  the  same  probability  of  detection  as  the  maximum  likelihood  detector  T2  or  T3  for 
a  fixed  probability  of  false  alarm.  It  is  easily  shown  to  be  independent  of  ||u|l^  and  Nq. 


Let  e,  be  the  (2A/  -  1)A^  component  vector  such  that  (e,),  =  1  and  (e,)j  =  0  for  j  ^  i. 
Define  the  following  quantities: 

^  (2.U-1)A- 

{2M  “l)A-  ^ 

^  (2.U-I)A' 

<^dB-avg  =  -'i)A’  ^  10’oe,oG(e,). 

These  correspond  to  the  maximum,  minimum,  and  average  gains  for  detecting  a  transient 
signal  which  has  a  single  nonzero  coefficient.  In  Figure  19,  these  quantities  are  shown  for 
M  =  N  =  16  and  A  6  [0.1,10].  These  curves  point  to  the  same  conclusion  as  the  curves 
showing  the  reduction  in  estimation  variance  in  Figure  18.  Specifically,  the  maximum 
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Figure  19.  The  maximum,  average,  and  minimum  gains  in  signaJ-to-noise  ratio  defined 
by  the  additional  SNR  required  by  the  detector  using  the  biorthogonaJ  function  method  to 
attain  the  same  performance  as  the  detector  using  the  maximum  likelihood  method.  The 
signal  under  consideration  is  a  real  signal  with  a  single  non-zero  coefficient  in  the  Gabor 
representation  with  M  =  N  =  16  using  the  exponential  window  function  with  parameter  A. 
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likelihood  detector  T2  or  r3  is  always  better  than  the  detector  Ti,  and  on  the  average  it 
will  be  severed  dB  better  for  A  >  2. 


4.5  The  Exponential  Window  with  Infinite  Observation 
Intervals  and  Sequential  Transient  Detection 
4.5.1  Introduction 

The  processing  of  a  block  of  data  representing  a  finite  observation  interval  can  be 
important  once  this  block  has  been  identified  as  w'arranting  further  analysis.  However, 
transient  signals  have  unknown  arrival  times,  and  in  many  applications  data  is  arriving 
continuously.  For  these  reasons,  it  is  more  practical  to  consider  signal  processing  schemes 
which  process  the  incoming  data  sequentially.  Thus,  when  a  transient  actually  arrives,  it 
can  be  detected  with  little  delay  so  that  further  processing  and  action  can  be  taken. 

With  this  philosophy  in  mind,  the  ma,ximum  likelihood  technique  for  determining  the 
Gabor  coefficients  using  an  exponential  window  is  analyzed  when  the  observation  interval  is 
the  entire  time  line,  that  is  <  e  HI.  At  first  glance,  this  would  seem  to  make  the  processing 
unrealizable  since  the  sufficient  statistics  now  require  matched  filtering  an  infinite  interval  of 
data.  However,  it  will  be  shown  that  the  processing  actually  simplifies  and  is  realizable  when 
an  infinite  observation  interval  is  considered.  Moreover,  the  ma.ximum  likelihood  processing 
results  in  the  sequential  processing  of  the  data  and  in  the  sequential  computation  of  the 
Gabor  coefficients.  Using  the  sequential  estimation,  a  scheme  is  proposed  and  evMuated  for 
the  sequential  detection  of  transient  signals. 


4.5.2  Maximum  Likelihood  Estimation  of  the  Gabor  Coefficients  for 
Infinite  Observation  Intervals 

There  are  two  ways  to  analyze  the  computation  of  the  maximum  likelihood  estimates  of 
the  Gabor  coefficients  when  an  in/inite  observation  interval  is  considered.  One  uses  filters 
and  sequences  and  the  other  uses  matrices  and  vectors.  Because  the  finite  observation 
interval  problem  has  already  been  presented  using  matrices  and  vectors,  it  will  be  more 
intuitive  to  continue  using  this  approach  here.  It  should  be  noted,  however,  that  both 
methods  are  equivalent.  The  matrices  and  vectors  in  this  situation  are  “doubly-infinite”  so 
that  a  matrix  A  has  entries  Amn  for  m,n  E  ^  and  a  vector  v  has  entries  v„  for  n  €  Z. 
Pictorially,  the  matrix  can  thought  of  as 


^-1,-1  -4-1,0  ^-1,1 

.4o,_i  >lo,i 

Ai,o  -4],] 


and  the  vector  can  be  symbolized  by 


T' 

=  I’o  t'l  •••]  . 


The  analysis  in  Section  4.4.3  concerning  the  sufficient  statistics,  the  signal  cross- 
correlations,  and  the  functional  form  of  the  maximum  likelihood  estimates  applies  to  the 
present  situation  as  well,  setting  /  =  IR.  Now,  the  sufficient  statistics  are  given  by 


00  00 

^mn  —  /  g^{t  -  n)cos(27rm/)(/r<  =  j  -  n)cos(2nmt)drt 


—  OO 
00 


n 

oc 


(73) 


Qmn  =  j  ge{t  -  Ti)s\n{2TTmt)drt  =  J  gdt  -  n)sin{2Kmt)drt. 


These  sufficient  statistics  are  not  realizable  because  they  require  observations  from  the  semi- 
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infinite  interval  [n,oo).  The  signal  cross-correlations  in  (48)  are  easily  evaluated,  yielding 
r"[(*,0,(m,„)l  =  2A^xp{-A|„  -  /|}  [:;p  4  j(„  _  +  4V -f  4  j(m  +  t)^, 

r'«[(i.i),(m,„)l  =  Aexp{-A|4  -  ^  W 


2T{k  —  m)  2ir(k  -|-  m)  . 

4A-  45r-(m  —  k)~  4X^  +  4x2(m  4-  it)^] 


(»«.«)]  =  Aexp{-A|n  -  /|} 

r««I(t,0,(m,n)l  =  2A2exp{-A|4  -  /|}  -  ^q.--4;2(„,  ^ 

Once  again,  the  signal  cross-correlations  factor  into  a  product  of  two  terms,  one  depending 
only  on  the  difference  in  delay  [n  — 1\  and  the  other  depending  only  on  the  frequency  indices 
m  and  n.  Thus,  the  signal  cross-correlation  matrix  R  factors  into  the  Kronecker  product 
R  =  R]  0  R2  where  Rj  is  the  doubly-infinite  Toeplitz  matrix  with  entries 


(R-l)/,n  =  exp{-A|n  -  /|} 

and  R2  is  the  same  matrix  as  before  (59).  In  order  to  compute  the  maximum  likelihood 
estimates,  the  inverse  of  R  and,  hence,  the  inverse  of  Ri  are  required.  The  inverse  of  a 
doubly-infinite  Toeplitz  matrix  can  be  found  by  computing  the  Z-transform  of  the  sequence 
of  coefficients  in  any  column  of  the  matrix,  and  finding  the  power  series  expansion  of  its 
reciprocal.  Specifically, 

Rx{z)=  f;  (R,)„,o-'" 

n=— 00 

=  H  exp{-A|n|}c" 

n=— cx) 

00  00 

=  X]exp{-An}’-”  -H  J])e,xp{-An}^"  -  1 
0  0 

= _ ^ ^ _ 1 

1  — exp{-A}i~^  1  — exp{-A}2 

_  l-exp(-2A} 

-  exp{-A}z“^  +  (1  +  exp{-2A})  -  exp{-A}z' 

Thus,  the  reciprocal  of  the  Z-transform  is 

1 _ -  exp{-A}  _j  1  +  exp{-2A}  exp{-A} 

i2i(2)  l-exp{-2A}^  ^l-exp{-2A}  l-exp{-2A}‘' 
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from  which  it  follows  that  Rj  ^  is  symmetric  tridiagonal  Toeplitz  with  entries 


’  1  ■fexp{-2A} 

1  -  exp{-2A}  ’ 
<  -exp{-A} 

1  -  exp{-2A}  ’ 

lo, 


if  n  =  / 


if  |n  -  11  =  1 
if  |n-/l  >  1. 


(74) 


The  structure  of  the  vector  of  sufTiciont  statistics  T  which  is  composed  of  the  doubly- 
infinite  sequence  of  (2A/  -  1)  component  vectors  Tn  in  (52)  is 

T=[...Tl,Tf  xr-.f. 

Define  a  doubly-infinite  sequence  of  {2M  —  1)  component  vectors  t„  by 

-exp{-A}  ^  ,  l+exp{-2A}^  ,  -exp{-A}  ^ 

”  l-exp{-2A}  l-exp{-A}  "  1  -  exp{-2A} 

The  sequence  {t,,}  is  the  result  of  filtering  the  sufficient  statistics  with  RJ"'.  In  keeping 
with  the  structure  of  Tn,  let  the  components  of  tn  be  denoted  imn  and  qmn  according  to 

r.  .  .  iT 

—  (*0,n  M,n  ■  ■  ■  *,U— l,n  9l,n  '  ■  ■  —  l,n 


Define  the  function  /i(t)  by 


^(0  =  -  1) 


1  -  exp{-2A} 
v/2A 


1  -  exp{-A}  l-exp{-2A}‘ 

exp{-At}  [-exp{-2A}u(t  +!)  +  (!+  exp{-2A})  u(t)  -  u(t  -  1)] 


1  -  exp{-A} 

Note  that  h{t)  is  zero  outside  the  interval  [-1,1]-  It  can  easily  be  checked  that  imn  and 
qmn  are  given  by 

1 

^mn  —  /  h(t  -  n)  cos{27rmt)drt 


-1 


k 

qmn  =  j  h(i  -  n)sm[2~mt)drt. 
-1 
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These  equations  are  reminiscent  of  the  filtering  in  the  biorthogonal  function  method  in 
(34).  The  filter  h(i)  is  shown  in  Figure  20  for  A  =  0.5  and  A  =  4  for  comparison  to  the 
biorthogonaJ  function  in  Figure  10  and  Figure  11.  It  is  apparent  that  h{t)  performs  matched 
filtering,  in  essence,  by  amplifying  regions  with  large  signal  values  and  attenuating  areas 
with  small  signal  values.  This  is  in  contrast  with  the  biorthogonal  function  which  does  the 
reverse.  Also,  the  calculation  of  the  statistics  {imn)9mn}  is  realizable  and  does  not  require 
the  calculation  of  {Imn^Qmn}-  A  further  simplification  of  the  computation  can  be  made 
by  setting 


*mn  — 


9mn  — 


\/n 

1  -  exp{-A} 

V2X 

1  -  exp{-A} 


1 

J  exp{—Ai}  cos(27rmt)dri^„ 
0 

1 

J  exp{  — Af}  sin(27rm<)(ir<+„ 
0 


so  that 

imn  =  (1  +exp{-2A})T,„,„  -  exp{-2A}T,„,„_l 
9mn  =  (1  +exp{-2A})^,„.„  -  exp{-2A}9^,„_i . 


(75) 


The  {2M  -  1)  component  of  ma.ximum  likelihood  estimates  of  the  coefficients  with  time 
index  n,  Wn  with  form  (50),  is  given  by 

W„  =  Rjh„. 

Thus,  the  calculation  of  the  coefficient  estimates  for  the  time  index  n  depends  only  on  the 
observations  from  the  interval  [n  -  l,n  +  1].  This  yields  the  desired  sequential  implemen¬ 
tation. 


Because  a  simple  analytic  form  is  available  for  estimator  variance  of  the  two 

schemes  can  be  analytically  compared  when  M  =  1,  that  is  when  only  baseband  transients 
are  considered.  Then,  R2  reduces  to  a  scalar.  The  variances  are  found  from  (57),(59).(61), 
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(b) 


Figure  20.  The  filter  h(t)  used  in  the  sequential  computation  of  the  maximum  likelihood 
coefficient  estimates  for  (a)  A  =  0.5  and  ^b)  A  =  4. 


1.50 


and  (74)  to  be 


Var  =  •^sinh(2A), 

^  (76) 

=  Aocoth(A) 

where  A„  is  the  coefficient  of  the  basebcind  pulse  with  delay  n.  The  variances  in  (76)  are 
plotted  as  a  function  of  A  in  Figure  21.  The  figure  shows  that  the  maximum  likelihood  esti¬ 
mate  has  uniformly  lower  variance  than  the  BFM  estimate,  and  the  reduction  in  estimation 
variance  is  most  pronounced  for  large  A. 

4.5.3  Sequential  Transient  Detection 

Using  the  results  of  the  previous  section,  a  detection  scheme  is  proposed  which  uses 
the  maximum  likelihood  estimates  of  the  coefficients  computed  by  the  sequential  algorithm. 
Specifically,  a  quadratic  form  of  the  estimated  Gabor  coefficients  at  each  time  index  and 
frequency  index  is  compared  to  a  threshold.  If  one  of  the  test  statistics  exceeds  the  threshold, 
the  presence  of  a  transient  is  declared.  This  test  also  provides  transient  classification  because 
the  positive  test  yields  the  transient  frequency  and  delay.  The  details  are  as  follows. 

Let  Tmn  be  the  test  statistic  for  a  transient  with  frequency  index  m  and  delay  n.  It  is 
defined  as  follows.  For  m  =  0,  reorder  the  signal  cross-correlation  matrix  S  so  that 

(S)i,i  =  ^'■^^[(0,0),(7i,rj)]. 

Then,  To  n  is  simply 

To.n  = 

For  the  other  frequency  indices,  a  transient  with  a  particular  frequency  index  and  phase 
shift  would  show  up  in  both  b'mn  and  Vmn  so  that  a  quadratic  form  is  desired  that  combines 
the  two.  Fix  m  and  n.  Reorder  the  signal  cross-correlation  matrix  so  that  the  Sn  is  the 
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Figure  21.  The  variances  of  the  estimates  of  the  coefficients  by  the  sequential  processing 
implementing  the  biorthogonal  function  method  and  the  maximum  likelihood  method.  The 
signal  under  consideration  is  a  real  signal  with  the  Gabor  representation  with  A/  =  1 
(baseband  only  transients)  using  the  exponential  window  function  with  parameter  A. 
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2x2  matrix  corresponding  to  the  pair  (m,n),  that  is 

g  1  r^^[(m,m),(n,n)]  77i),(n,  n)] 

■^0  [r^^[(m,m),(n,n)]  rW[(rn,m),(7i,n)] 

The  covariance  matrix  of  Wmn  =  \Umn  ^mnj  is  then  Sn  =  (S“^)  jj-  It  is  easy  to  show 
that  the  generalized  likelihood  ratio  to  test  the  hypothesis  “‘'Wmn  ~  iV(u,  Sji),”  where 
u  6  IR"  is  an  arbitrary  non-zero  vector,  versus  the  hypothesis  “'Wmn  ~  is 

simply 

Tmn  =  ^Inn^uWmn, 

which  is  the  test  statistic  for  a  transient  with  frequency  index  m  ^  0  and  delay  n. 

Note  that  this  test  statistic  was  developed  using  the  maximum  likelihood  estimates  as 
the  observations,  not  from  the  original  observations.  In  terms  of  the  original  observations, 
this  test  corresponds  to  the  generalized  likelihood  ratio  for  testing  arbitrary 

k,l  e  2}”  against  “{Uf-i,  14/  arbitrary  k  ^  m,  I  n,  Umn  =  Vmn  =  0}”  as  is  shown  below. 
The  generalized  likelihood  l{z)  ratio  reduces  to 

/(z)  =  m^x  (2y^z  -  iNoy^Sy^  -  rnax  (2yjz  -  A'oy2^Sy2) 

=  -^z^S~^z  - 

No  Nq  ■  --  * 

=  — Z^Sz  -  (^22  ~  ^21^1/^12)  ^2 

=  —  ^zJ'SijZi  -H  2z|^Ei2Z2  +  Z2E2]EJ'j*Ei2Z2) 

=  ^  (Siizj  +  E2iZ2)^  (SiiZ]  -f  E21Z2) 

=  Tmn- 

This  test  is  in  contrast  with  the  fi.xed  sample  size  test  in  Section  4.4.5  because  it  Wcis  assumed 
there  that  all  the  coefficients  are  zero  except  possibly  the  ones  of  interest.  Here,  where  there 
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is  an  infinite  number  of  coefficients,  that  is  not  a  realistic  assumption.  Thus,  the  coefficients 
of  interest  are  tested  for  being  non-zero  while  not  restricting  the  other  coefficients  to  be 
zero. 


The  sequential  detection  algorithm  with  thresholds  ho  a-iid  h  is  simply 

7b, n  >  ^0  =>■  Declare  transient  present  in  bin  (0,n) 

T\^n  >  h  =^Declare  transient  present  in  bin  (l,n) 

7’A/-i,n  >  h  =» Declare  transient  present  in  bin  (M  -  l,n) 
Continue,  n  =  n  -f-  1. 


Equivalently,  for  each  n,  the  maximum  of  the  statistics  {To,n/ho;  T^n/h,  m  =  1,. ..  A/  - 
1}  could  be  compared  to  a  threshold  of  one.  This  could  be  generalized  to  detect  more 
complicated  transients  which  involve  several  damped  sinusoids  by  comparing  some  function 
of  the  k  out  of  M  largest  order  statistics  to  a  threshold. 


The  distribution  of  the  test  statistic  To,n  is  chi-square  with  one  degree  of  freedom  when 
the  corresponding  coefficient  i4o,u  is  zero.  It  is  non-central  chi-square  with  one  degree  of 
freedom  with  noncentrality  parameter  „  given  by 

=  (s-X„ 

when  i4o,n  ^  0-  The  distribution  of  the  test  statistic  Tmn  for  m  ^  0  is  chi-square  with 
two  degrees  of  freedom  when  no  transient  is  present  and  is  non-central  chi-square  with  two 
degrees  of  freedom  and  noncentrality  parameter  given  by 


c2  _  i« 

^mn  — 
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rp 

where  Umn  =  [Umn  V’mn]  is  the  true  coefRcicnt  vector. 

For  a  given  threshold  /i,  the  threshold  Hq  can  be  set  according  to 

^{XI  >  ho)  =  P{xl  >  h},  (77) 

where  Xj  and  X2  are  chi-square  random  variables  with  one  and  two  degrees  of  freedom.  This 
means  that  the  probability  of  false  cdarm  Pj^  for  the  decision  for  each  coefficient,  which  is 
given  by  the  expressions  in  (77),  will  be  the  same  for  all  m.  In  order  to  determine  the  mean 
time  between  false  alarms,  the  following  convention  is  assumed.  For  each  fixed  n,  if  one  or 
more  of  the  test  statistics  exceed  the  threshold,  it  is  considered  as  a  single  false  alarm.  If  an 
alarm,  false  or  otherwise,  is  detected  at  time  n,  then  any  alarm  at  time  n  -|- 1  is  not  counted 
as  a  separate  alarm.  Using  these  conventions,  and  the  fact  that  the  maximum  likelihood 
coefficients  for  bins  (kj)  and  (m,n)  are  independent  for  |n  -  /|  >  1  as  a  consequence  of 
(74),  the  mean  time  between  false  alarms  Tf^  is  bounded  by 

_J_<T 
MPfa-  ^^-Pfa' 

The  probability  of  detection  P^  is  given  by 

p  _  f  ^{Xi(^0,n)  >  *0},  for  m  =  0 
^  \p{x2i^m,n)  >  k},  for  m  0, 


where  X^(^^)  is  a  non-central  chi-square  random  variable  with  i  degrees  of  freedom  and 
noncentrality  parameter  6^. 


4.6  Conclusion 


In  this  chapter,  the  Gabcr  representation  was  discussed  because  of  its  desirable  localiza¬ 
tion  properties  in  both  time  and  frequency.  Two  methods  of  computing  and  estimating  the 
Gabor  coefficients  were  examined.  The  maximum  likelihood  estimates  and  detectors  based 
on  them  were  shown  to  be  superior  to  those  obtained  by  the  more  traditional  approach 
using  the  biorthogonaJ  function.  Using  an  exponential  window  function,  the  Gabor  repre¬ 
sentation  was  applied  to  the  detection  of  transient  signals.  A  simple  sequential  detection 
procedure  was  gi'^en  to  detect  transient  signals  using  the  maximum  likelihood  estimates. 

There  are  several  avenues  for  further  research.  Throughout  the  chapter  it  has  been 
assumed  that  the  delays  and  frequencies  could  only  take  on  a  discrete  set  of  values  indexed 
by  the  integers.  It  would  be  necessary  for  a  real  appOcation  to  determine  the  performance 
when  this  assumption  is  not  valid.  Also,  it  was  assumed  that  r"  for  the  Gaussian  window  and 
A  for  the  exponential  window  were  known  a  priori.  This  may  not  be  the  case  in  reality.  Thus, 
the  effect  of  the  window  mismatch  could  be  e.xamined.  Perhaps  am  optimal  window*  function 
for  a  particular  application  could  be  designed  to  minimize  the  mismatching  effects,  or  a  bank 
processors  could  be  used  to  implement  several  different  window  functions  simultaneaously. 
Finally,  detection  procedures  using  the  Gabor  representation  could  be  compared  to  more 
traditional  signal  processing  techniques  and  representations  such  as  Prony’s  method  and 
the  short-time  Fourier  Transform. 
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The  Optimal  Sample  Size  for  Transient  Detection 


5.1  Introduction 


Often  when  sequential  data  processing  algorithms  are  implemented  in  hardware,  the 
data  are  grouped  into  fixed  size  blocks  to  facilitate  the  implementation.  An  example  of 
this  type  of  processing  is  a  hardware  implementation  of  the  Fast  Fourier  Transform  which 
might  be  used  to  sequentially  compute  short  time  Fourier  Transforms  as  data  arrive.  The 
same  approach  could  be  used  to  compute  the  Gabor  coefficients  discussed  in  Chapter  4.  In 
that  context,  the  transient  model  prescribed  the  optimal  block  size  for  the  processing.  The 
problem  considered  in  this  chapter  is  to  find  the  optimal  sample  size  or  integration  period 
for  the  detection  of  a  transient  which  arrives  at  an  unknown  time. 

The  approach  using  fixed  sample  size  tests  in  place  of  a  fully  sequential  algorithm 
was  first  considered  for  the  quickest  detection  problem  [2,3,4].  In  [4],  Shiryaev  considered 
the  quickest  detection  problem  for  detecting  an  abrupt  change  in  the  drift  of  a  Brownian 
motion.  In  the  fixed  sample  size  approach  which  he  called  the  Neyman-Pearson  method, 
the  test  statistics  were  based  on  observations  from  consecutive  intervals  of  fixed  length.  He 
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then  considered  the  interval  length  that  would  minimize  the  mean  delay  in  the  detection 
of  the  change  for  a  fixed  mean  time  between  false  alarms  when  the  change  is  assumed  to 
begin  with  uniform  distribution  during  the  interval  in  which  it  actually  occurs.  Pelkowitz 
and  Schwartz  [2]  and  Pelkowitz  [3]  considered  the  quickest  detection  problem  in  discrete 
time.  By  using  the  Central  Limit  Theorem  for  large  sample  sizes,  they  were  able  to  find  the 
optimal  sample  sizes  for  Gaussian  and  non-Gaussian  samples.  Two  different  criteria  were 
examined,  the  mean  criterion  and  the  minimax  criterion.  The  mean  criterion  is  the  same 
one  that  Shiryaev  used,  i.e.,  minimize  the  mean  delay  for  a  fixed  mean  time  between  false 
alarms,  when  the  change  time  is  equally  likely  to  occur  any  time  during  the  block.  The 
minimax  criterion  attempts  to  minimize  the  expected  delay  for  the  least  favorable  change 
time  for  a  fixed  mean  time  between  false  alarms.  The  least  favorable  change  time  is  defined 
to  be  the  time  during  the  integration  period  which  yields  the  largest  expected  delay  in 
detection.  Using  an  asymptotic  analysis,  a  procedure  is  presented  to  determine  the  optimal 
sample  size  and  the  detection  threshold  for  these  criteria. 

In  this  chapter,  the  question  of  the  optimal  sample  size  for  discrete-time  observations 
or  the  optimal  integration  period  for  continuous-time  observations  will  be  examined  in  the 
context  of  transient  signal  detection.  In  order  to  obtain  tractable  solutions  for  the  discrete¬ 
time  case,  it  is  assumed  that  either  the  observations  are  Gaussian  or  that  the  transient  lasts 
for  a  large  number  of  samples  so  that  the  Central  Limit  Theorem  (CLT)  can  be  applied. 
As  in  [2,3],  the  discrete-time  detection  scheme  is  based  on  fixed  sample  size  test  statistics 
5„  of  the  form 

^m- 

m=nT 

Here,  T  is  the  integer- valued  sample  size  and  is  some  function  of  the  raw  observations 
{i„}.  Candidate  functions  include  the  log-likelihood  ratio  or  the  squared  magnitude  of 
a  DFT  coefficient.  These  statistics  are  computed  sequentially  and  a  detection  is  declared 
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when  a  threshold  h  is  exceeded.  In  continuous  time,  some  form  of  the  CLT  could  be  used 
for  non-Gaussian  observations,  but  this  raises  many  technical  issues  such  as  the  definition 
of  the  continuous-time  integraJ.  In  order  not  to  complicate  the  essential  problem  being 
investigated,  the  continuous-time  observations  are  assumed  to  come  from  a  white  Gaussian 
process.  The  detection  scheme  in  continuous  time  is  based  on  test  statistics  5n  of  the  form 

tiT 

Sn  =  I  x{t)dt. 

(n-l)T 

Here,  T  is  the  real-valued  integration  period  and  x{t)  represents  the  observations.  Again, 
these  statistics  are  compared  to  a  threshold  h  to  declare  a  detection. 

The  optimal  sample  size  and  integration  period  for  the  detection  of  transient  signals 
will  be  examined  with  the  optimization  carried  out  with  respect  to  both  the  mean  criterion 
and  the  minimax  criterion.  The  performance  criteria  are  the  same  as  those  in  Chapter 
3,  the  mean  time  between  false  alarms  and  the  probability  of  detection.  Thus,  instead  of 
minimizing  the  expected  delay  as  is  done  for  the  quickest  detection  problem,  the  probability 
of  detection  is  maximized  according  to  one  of  the  two  optimality  criteria  for  a  fixed  mean 
time  between  false  alarms. 

In  addition  to  the  false  alarm  constraint  in  the  optimization  problem,  an  additional 
constraint  is  instituted  to  achieve  desirable  solutions.  That  constraint  specifies  that  the 
admissible  set  of  sample  sizes  for  the  optimization  procedure  is  an  interval  restricted  to  some 
neighborhood  of  the  transient  duration.  The  are  two  reasons  to  do  this.  First,  this  ensures 
there  will  be  relatively  little  delay  in  detecting  the  transient.  Second,  if  this  constraint  were 
not  enforced,  the  optimization  would  yield  a  degenerate  and  useless  solution  as  can  be  seen 
as  follows.  The  best  performance  one  could  hope  for  is  a  very  large  mean  time  between 
false  alarms  and  a  probability  of  detection  near  one,  but  this  is  actually  very  easy  obtain. 
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Consider  a  detection  scheme  with  the  integration  period  equal  to  the  desired  mean  time 
between  false  alarms  and  a  detector  which  always  declares  a  transient  present.  This  detector 
has  exactly  the  desired  mean  time  between  false  alarms  with  a  probability  of  detection  of 
one  and  yet  it  provides  no  useful  information. 

The  outline  for  the  remainder  of  this  chapter  is  as  follows.  In  Section  2,  the  exact  prob¬ 
abilistic  model  considered  is  presented  and  the  necessary  equations  for  the  two  optimality 
criteria  are  derived.  Numerical  results  are  discussed  in  Section  3.  Section  4  considers  the 
application  of  the  detection  scheme  to  narrowband  signals  and  conclusions  are  presented  in 
Section  5. 


5.2  The  Transient  Detector  and  the  Optimality  Criterion 

We  first  consider  the  discrete-time  situation.  The  statistical  model  for  the  transient  is 
specified  in  a  manner  similar  to  the  model  in  [2,3].  Assume  there  exists  a  random  sequence 
yn  such  that: 

i)  yn  is  a  strict-sense  stationary  process  with  0  <  £[|y7,|'"]  <  oo  for  m  <  3. 
ii)  y„  is  Z-dependent,  i.e.,  there  exists  an  integer  L  such  that  if  the  event  A  depends  on 
{Vm  ,m  <  n}  and  an  event  B  depends  on  >  L  -f  n},  then  A  and  B  are  independent. 

Assume  the  transient  lasts  r  samples  and  it  begins  at  the  m’th  sample  of  the  n’th  block, 

i.e.,  at  time  to  =  T{n  -  1)  -|-  m.  Then,  the  observations  in  terms  of  the  {^n}  are  given  by 

!yn,  for  n  <  <0 
yn  +  A,  for  to  <  n  <  to  -I-  r  -  1 
?/„,  for  n  >  to  -f-  r. 
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Here,  /4  is  a  positive  constant.  This  model  is  not  as  complicated  as  the  one  used  in  the 
disorder  problem  in  [2,3].  The  model  used  there  allowed  for  a  change  in  the  second  order 
statistics  in  addition  to  a  chcinge  in  mean.  In  that  context,  the  covariance  structures  before 
and  after  the  change  were  made  to  approach  each  other  asymptotically.  A  local  asymptotic 
approach  such  as  this  is  valid  for  the  disorder  problem  because,  once  the  change  time  occurs, 
there  is  an  arbitrarily  long  time  to  detect  the  change  since  the  change  lasts  forever.  Thus, 
even  extremely  weak  signals  could  be  detected  by  integrating  and/or  waiting  long  enough. 
In  the  transient  signal  detection  problem  where  the  signal  has  finite  duration,  this  small 
signal  approach  is  not  realistic,  and  the  problem  becomes  intractable  using  the  more  general 
model. 

As  pointed  out  in  [2,3],  assumption  ii)  implies  that  and  Sn+m  independent  for 
T  >  L  and  m  >  2.  Moreover,  if  T  >  T,  the  L  samples  at  the  beginning  and  end  of  the 
sum  will  contribute  a  negligible  amount  to  the  overall  sum  so  that  all  the  Sn  will  essentially 
be  mutually  independent.  It  is  proved  in  [3]  that  as  T  -»  oo  the  {S„}  tend  to  mutually 
independent  Gaussian  random  variables  using  the  CLT  and  assumptions  i)  and  ii).  We  will 
use  this  fact  for  large  T  with  T  ^  L. 

As  in  [2,3],  we  define  the  following  parameters: 

A^O  =  E[yn] 
al  =  Var[y„] 

ro(m)  =  -^£[(yn  -  Ho){yn+m  -  Mo)]  =  ro(-m) 

^0 

L  L 

10=  '^oim)  =  1  +  2  ro(m) 

m=-L  m=l 

2  2 
(7  =  70(7^ 

13  =  -^  =  -. 
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The  effective  noise  power  and  the  effective  single  sample  signal-to-noise  ratio  (SNR)  /? 
take  into  account  the  effect  of  the  noise  correlation . 

For  the  continuous-time  model,  the  observed  random  process  x{t)  is  defined  by 

x{t)  =  A[u{t  -  to)  -  u{t  -  to  -  t)]  +  n(t),  (1) 

where  /I  is  a  positive  constant,  u(t)  is  the  unit  step  function,  Iq  is  the  unknown  arrival  time, 
r  is  the  transient  duration  which  is  assumed  known,  and  n(t)  is  a  white  Gaussian  noise 
process  with  mean  /xo  and  autocorrelation  function  £[(n(t)  -  /io)(n(t  +  •s)  —  Mo)]  =  (t^6(s). 
The  SNR  is  defined  by  /3  =  A/a.  Setting  r  =  oo  in  (1)  results  in  the  quickest  detection 
problem  that  Shiryaev  investigated  [4]. 

Note  that  the  same  variables  have  been  used  for  the  discrete-time  and  continuous-time 
cases,  namely  the  test  statistics  5„,  the  sample  size  or  integration  period  T,  the  arrival  time 
of  the  transient  <0i  the  transient  duration  r,  the  mean  of  the  noise  mo>  the  effective  noise 
variance  cr^,  and  the  effective  SNR  0.  In  the  remainder  of  this  chapter  both  interpretations 
of  the  variables  apply  unless  otherwise  stated. 

When  the  transient  overlaps  the  n’th  block  of  samples  or  the  n'th  integration  period, 
the  distribution  of  the  test  statistic  {^n}  is 

Sn  ~  .V(.4r„  +  fiQT,Ta‘), 

where  Tn  represents  the  duration  (in  samples  or  seconds)  of  the  period  of  overlap  of  the 
transient  in  the  n’th  block.  To  derive  this  for  the  discrete  time  case,  it  is  assumed  that  T 
is  large  and  T  L  [2,3].  The  random  variables  (5^)  are  mutually  independent. 

When  =  0,  such  as  before  the  transient  arrives,  the  test  statistic  is  distributed 


according  to  Sn  ~  N{fioT,Ta2)-  Thus,  the  probability  q  that  a  false  aljirm  results  from  5n 
is  given  by 

a  =  P{5„  >  hlr„  =  0}  =  1  -  $  , 

where  $(•)  is  the  standard  Gaussian  cumulative  distribution  function.  Thus,  on  the  average 
there  wiU  be  one  false  alarm  every  1/a  integration  periods  so  that  the  mean  time  between 


false  alarms  Mf  is 


It  will  be  convenient  to  define  a  normalized  threshold  A  given  by 


^|■oT  _  -h 


(2) 


The  probability  of  detection  is  a  function  of  the  true  arrival  time  Iq,  but  it  suffices  to 
consider  the  arrival  time  modulo  T  so  that  it  is  assumed  that  <o  lies  in  the  range  0  <  to  <T. 
There  are  two  cases  to  consider:  T  >  r  and  T  <  t.  For  the  first  case,  the  transient  can 
either  overlap  with  one  or  two  blocks  as  shown  in  Figure  1.  The  transient  is  completely 
contained  in  one  integration  period  Iq  <T  -  r.  If  to  >  T  -  r,  then  the  transient  overlaps 
with  two  blocks.  The  probability  of  detection  Pd{to)  is  easily  found  to  be 

^  +  '^)  1  if  <0  <  T  -  r 

(3) 

When  T  <  T,  the  transient  overlaps  with  at  least  two  intregration  periods  as  shown  in 
Figure  2.  As  seen  in  the  figure,  there  are  blocks  on  either  end  of  the  transient  that  contain 
a  mixture  of  noise  only  and  transient  plus  noise  parts,  and  there  are  a  number  of  blocks 
in  the  center  which  are  completely  characterized  by  the  transient  plus  noise  situation.  The 

IG5 


- \  I  r 

(n-1)T  nT  (n+1)T  (n-1)T  nT  (n+1)T 


(a) 


(b) 


Figure  1.  When  T  >  r,  the  transient  can  overlap  (a)  one  or  (b)  two  integration  periods. 


rj — 

^  1 

.  -  V 

■  »  ^ 

^  »  r—  T  " 

Figure  2.  An  example  of  how  the  transient  overlaps  the  integration  periods  for  T  <  r. 
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number  k{to)  of  the  center-type  blocks  for  the  continuous-time  case  can  be  found  to  be 


k{to)  = 


[r/TJ-1,  ifO<to<T(Lr/rj-hl)-r 


l[r/rj,  iiTi[r/T\  +  l)-T<to<T, 

where  [zj  is  the  greatest  integer  less  than  or  equal  to  z.  For  the  discrete-time  case,  the 
two  regions  of  support  for  A:(to)  may  differ  from  those  in  (4)  by  one  sample,  but  this  will  be 
insignificant  when  T  is  large.  The  probability  of  a  detection  resulting  from  the  center-type 
blocks  Pj(to)  is  one  minus  the  probability  that  there  is  no  detection  from  any  of  them,  so 


The  probability  of  a  detection  from  the  first  partially  overlapping  block  Pj(to)  is  g’ven  by 


'(fo)  =  $( 


/3{T-to) 


+  A  )  . 


The  probability  of  a  detection  from  the  last  overlapping  block  Pj(io)  is  given  by 


+  ifO  <  to  <  r(Lr/TJ  +  l)-r 

PjOo)  =  I  ^  (") 

^ ^ a],  ifr(Lr/rj  +  i)-r<to<r. 


The  overall  probability  of  detection  Pj(to)  is  now  given  by 


^d(fo)  =  PjVo)  +  (1  -  Pj((o)]Pj((o)  +  [I  -  Pj^(to)J  [l  -  P/(to)j  P^((o).  (S) 


The  mean  probability  of  detection  P^,  which  is  obtained  by  assuming  that  the  transient 
signal  can  begin  at  any  time  during  a  block  of  samples  or  an  integration  period  with  equal 


probability,  is  ^ven  by 


1 

f’j  =  7E 


for  the  discrete-time  case  and 


i 

'Pd  =  jj  Pd{l)dt 
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for  the  continuous- time  case.  In  the  sequel,  it  will  be  shown  that  the  sum  in  (9)  for  large  T 
can  be  replaced  by  an  integral  of  type  (10)  with  T  =  1,  and  the  resulting  approximation  is 
better  for  larger  sample  sizes  T.  Using  the  mean  optimality  criterion,  the  objective  is  to  find 
the  sample  size  or  integration  period  T*  which  maximizes  according  to  the  optimization 
problem 

Maximize  with  respect  to  T: 

subject  to:  T  £  nbd{T) 

Mp  fixed. 

The  sample  size  is  restricted  to  an  interval,  denoted  by  n6d(r),  containing  the  point  T  =  t. 
The  endpoints  of  this  interval  are  chosen  so  that  the  ma.ximum  of  P^  with  T  £  nbd(T)  occurs 
in  the  interior  of  the  interval  and  not  at  one  of  the  endpoints.  The  reasons  for  adding  this 
restriction  were  discussed  in  the  introduction  and  will  be  clearer  when  the  performance 
curves  are  discussed.  No  simplifications  could  be  made  to  facilitate  the  computation  of  T* 
and  we  must  resort  to  numerical  techniques. 

For  the  minimax  optimality  criterion,  it  is  desired  to  find  the  sample  size  or  integration 
period  T**  which  maximizes  the  probability  of  detection  for  the  least  favorable  transient 
signal  arrival  time.  The  optimization  problem  to  find  T**  is 

Maximize  with  respect  to  T:  min  PAto) 

subject  to:  T  £  n6d(r)(ll) 

Mp  fixed. 

Again,  no  simple  analytical  expressions  could  be  found  for  T** .  However,  it  can  be  verified 
from  (3)  that  PA^o)  for  T  >  r  and  to  >T  -  t  has  an  axis  of  symmetry  about  the  time  <oo 
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given  by 


too  =  T  -  r/2. 

For  T  <  r,  we  find  from  (4)-(8)  that  Pd{to)  for  0  <  to  <  T'(  [r/TJ  +  1)  -  r  has  an  axis  of 
symmetry  about  the  time  tgi  given  by 


<01  =  2 


r(Lr/TJ  +  l)-r 


and  that  for  T{[t/T\  +  1)  —  t  <  to  <  T ,  Pjito)  has  an  axis  of  symmetry  about  the  time 
t02  given  by 

to2  =  \[Ti[T/T\+2}-T  . 

Thus,  {<oi}  must  be  the  positions  of  local  extrema  of  Pj(to)-  Some  preliminary  numerical 
computations  have  indicated  that  one  of  these  arrival  times  corresponds  to  a  local  and 
possibly  global  minimum  for  a  variety  of  signal-to-noise  ratios  and  values  of  the  mean  time 
between  false  alarms.  Define  the  set  A{T)  of  arrival  times  by 

'  (o+,(T  -  r)-,(T  -  r)+,too,T-l,  if  T  >  r 

A{T)  J  . 

|o+,t0i,<02,[r(Lr/rj  + 1) - r]-  ,[r([r/rj  + 1)  -  r]+  ,r-|,  if r  <  r 

which  is  the  set  containing  the  positions  of  the  known  local  extrema,  endpoints,  and  dis¬ 
continuity  points  of  Pdito)-  To  simplify  the  numerical  computation  of  T**  for  the  minimax 
criterion  in  (11),  the  following  optimization  problem  which  finds  an  approximate  solution 
T**  is  substituted:  the  minimax  criterion,  (11)  is  replaced  with 

Maximize  with  respect  to  T:  min  Pd(to) 

io€A{T) 

subject  to:  T  £  nbd{T) 

Mf  fixed. 


Define  P^'^  by 


nti'C  _ 

M  - 


min 

«oe.-ur) 


PdUo)- 
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(12) 


5.3  Numerical  Results 


Using  the  formulas  obtained  in  the  previous  section,  curves  were  generated  indicating 
the  probability  of  detection  versus  the  integration  period  for  the  continuous-time  case. 
Figure  3  shows  the  mean  probability  of  detection  Pj  as  a  function  of  the  integration  period 
for  a  transient  of  duration  r  =  1,  a  signal-to-noise  ratio  (SNR)  of  20  dB,  and  a  mean  time 
between  false  alarms  Mf  =  10^.  The  integration  period  is  shown  on  a  logarithmic  scale  so 
as  to  exhibit  a  rough  symmetry  about  the  local  maximum  occuring  near  T  =  0.58.  Note 
that  the  performance  is  arbitrarily  good  for  large  values  of  Mf  but  the  probability  of  false 
alarm  a  becomes  unacceptable  for  these  values.  For  example,  for  an  integration  period  near 
T  =  10^,  the  probability  of  detection  is  near  one  but  so  is  the  probability  of  false  alarm. 
For  the  reasons  mentioned  in  the  introduction,  we  will  focus  on  the  local  minimum  which 
occurs  near  T  =  1  and  thus  the  optimization  is  restricted  to  an  interval  about  T  =  1. 

Another  graph  of  the  mean  probability  of  detection  versus  the  integration  period  is 
shown  in  Figure  4(a)  for  a  transient  of  duration  t  =  1,  an  SNR  of  5  dB,  and  a  mean 
time  between  false  alarms  Mf  ~  10^.  Again,  for  large  values  of  Mf,  the  performance 
is  arbitrarily  good.  An  enlargement  of  the  region  around  T  =  1  is  shown  in  Figure  4(b) 
which  indicates  the  presence  of  a  local  meuximum.  From  this  figure  it  is  apparent  that  by 
choosing  T  =  10^  the  probability  of  defection  will  be  greater  taan  that  obtained  at  the  local 
maximum  near  T  =  \.  For  T  =  10^,  the  probability  of  false  alarm  is  u  =  10  ®  which  might 
be  quite  acceptable.  Nevertheless,  we  choose  to  focus  on  the  local  majdmum  near  T  =  1  for 
the  reasons  previously  discussed. 

Similar  plots  were  generated  for  the  worst  case  probability  of  detection  defined  in 
(12).  In  Figure  5,  the  worst  case  probability  of  detection  is  plotted  versus  the  integration 
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Figure  3.  The  mean  probability  of  detection  as  a  function  of  the  integration  period  T 
for  a  trzmsient  duration  r  =  1,  a  signal-to-noise  ratio  of  20  dB,  and  a  mean  time  between 
false  alarms  Mf  =  10®. 


171 


1.0 


Integration  Period  T  (seconds) 

(b) 

Figure  4.  (a)  The  mean  probability  of  detection  Pj  as  a  function  of  the  integration  pe.nod 
T  for  a  transient  duration  r  =  1,  a  signal-to-noise  ratio  of  5  dB,  and  a  mean  time  between 
false  alarms  Mf  —  10^.  (b)  An  enlargement  of  the  area  around  T  =  1  showing  the  local 
maximum  in  the  curve. 
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Integration  Period  T  (seconds) 


Figure  5.  The  worst  case  probability  of  detection  as  a  function  of  the  integration 
period  T  for  a  transient  duration  r  =  1,  a  signal-to-noise  ratio  of  20  dB,  and  a  mean  time 
between  false  alarms  Mf  =  10®. 
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Integration  Period  T  (seconds) 

(a) 


Integration  Period  T  (seconds) 


(b) 

Figure  6.  (a)  The  worst  case  probabilit:  of  detection  as  a  function  of  the  integration 
period  T  for  a  transient  duration  r  =  1,  a  signal-to-noisc  ratio  of  5  dB,  and  a  mean  time 
between  false  alarms  Mp  =  10®.  (b)  .\n  enlargement  of  the  area  around  T  =  1  showing  the 
local  maximum  in  the  curve. 
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period  with  the  same  parameters  as  in  Figure  3.  The  results  for  the  parameters  used  in 
Figure  4  are  shown  in  Figure  6.  Again,  the  remarks  made  in  connection  with  finding  the 
optimal  integration  period  to  ma-\imize  the  mean  probability  of  detection  apply  to  finding 
the  optimal  integration  for  the  minimax  criterion. 

For  a  transient  duration  of  r  =  1,  the  optimal  integration  period  was  numerically 
computed  as  a  function  of  the  signal-to-noise  ratio  and  the  mean  time  between  false  alarms. 
It  is  shown  in  Figure  7  for  the  mean  optimality  criterion  and  in  Figure  8  for  the  minimax 
optimtdity  criterion.  The  optimal  integration  period  for  the  minimax  criterion  varies  very 
little  with  the  SNR  and  the  mean  time  between  false  alarms.  The  optimal  value  in  this  case 
is  approximately  T**  «  0.56. 

A  procedure  is  now  presented  to  determine  the  optimal  integration  period  for  either 
criterion  for  an  arbitrary  transient  duration  for  the  continuous-time  case  and  for  large 
sample  sizes  in  the  discrete-time  case  using  the  curves  in  Figures  7  and  8.  To  do  this,  it  is 
useful  to  express  the  probability  of  detection  Pdi^o)  given  in  equations  (3)-(8)  as  a  function 
of  all  the  relevant  variables  cis  follows: 

Pdik)  =  Pdih,'’',T,MfjP,A,(7o,fJ.o)- 


It  is  easily  verified  that 

Pd{to,T,T,Mf,l3,A,ao,^o)  =  Pd  . 

Thus,  the  optimization  with  respect  to  T  for  a  given  transient  duration,  mean  time  between 
false  alarms,  and  SNR  is  equivalent  to  the  optimization  for  a  transient  of  duration  r  =  1 
using  the  normalized  parameter  values. 


175 


Mean  Time  Between  False  Alarms  (seconds) 
(a) 


Figure  7.  The  optimal  integration  period  T*  for  the  mean  criterion  for  a  transient  duration 
r  =  1  (a)  as  function  of  the  mean  time  between  false  alarms  for  a  fixed  signal-to-noise  ratio 
and  (b)  as  a  function  of  the  signal-to-noise  ratio  for  a  fixed  mean  time  between  lalse  alarms. 
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(a) 


Figure  8.  The  optimal  integration  period  T**  for  the  minimax  criterion  for  a  transient 
duration  r  =  1  (a)  as  function  of  the  mean  time  between  false  alarms  for  a  fixed  signal-to- 
noise  ratio  and  (b)  as  a  function  of  the  signal-to-noise  ratio  for  a  fixed  mean  time  between 
false  alarms. 
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The  procedure  is  as  follows: 


1.  Find  Tj  from  Figure  7  or  Tj**  from  Figure  8  for  a  mean  time  between  false  alarms 
Mpjr  and  an  effective  SNR  of  ^  =  y/rAla. 

2.  Compute  T*  =  tT{  or  f**  =  tT^*  . 

3.  Compute  the  threshold  from  (2),  that  is 

h  =  mT-  -  (£;) 

or 

h  =  .  (13) 

5.4.  The  Application  to  Narrowband  Transients 


In  this  section,  the  design  procedure  is  applied  to  the  problem  of  detecting  a  narrowband 
transient  signal  in  white  Gaussian  noise.  The  problem  formulation  is  almost  exactly  as  set 
up  in  [2,3]  for  the  quickest  detection  problem  and  it  is  repeated  here  with  the  appropriate 
modifications  for  the  transient  problem.  The  signal  and  noise  are  assumed  to  be  real  valued. 
The  signal  is  a  deterministic  sinusoid  with  random  phase.  Length  N  Discrete  Fourier 
Transforms  (DFT)  are  computed  by  the  windowing  and  overlapping  methods  described 
in  [ij.  N  is  chosen  so  that  at  the  sampling  frequency  /,,  the  frequency  resolution  /j/A' 
equals  the  signal  bandwidth.  The  single  sample  statistic  Zm  is  the  squared  magnitude  of 
the  coefficient  corresponding  to  the  frequency  band  of  interest  from  the  m’th  DFT.  Since 
the  DFT’s  are  overlapped,  the  statistics  {zm}  will  be  correlated,  but  the  correlation  will  be 
nonzero  only  for  a  finite  number  of  lags  for  white  Gaussian  noise.  The  random  signad  phase 
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is  modeled  as  approximately  constant  for  the  duration  of  each  DFT  and  independent  of  the 
phase  during  other  DFT’s  as  well  independent  of  the  noise.  Let  M  be  the  number  of  DFT’s 
computed  to  form  each  test  statistic  5„  which  is  to  be  optimized  to  detect  a  transient  of 
length  K  DFT’s.  It  is  assumed  that  K  and  hence  the  optimal  value  of  M  is  large  enough 
to  ensure  that  test  statistics  have  the  required  Gaussian  distribution. 


Proceeding  as  in  [2,3],  let  A  be  the  signal  amplitude,  u  the  frequency  offset  of  the 
signal  from  center  cell,  the  noise  variance,  and  w{n),  n  =  0, 1,...,A'^  -  1  the  window 
coefficients.  Let  r  be  the  percent  overlap  of  successive  DFT’s,  i.e.,  rN  points  overlap.  From 
[l],  we  have 

N-\ 


n=0 


70 


_  r  1  +  2C'(0.5),  for  50%  overlap,  r  =  0.5 

“  I  1  +  2C2(0.75)  +  2C-(0.5),  for  75%  overlap,  r  =  0.75 


where 


C(r)  = 


rA'  — 1 

xv{n)iv(n  +  [1  -  rjA’) 

n=0 


The  effective  single  sample  SNR  0~  in  terms  of  the  input  SNR  in  the  signal  bandwidth 
is  given  by  [2,3] 


■>  o  NA- 

0-  =  0-G{-:>)  = 

4(7- 


where  the  gain  G(J;)  is  given  by 


G{Z-)  =  Gp  ■  [I,(u;)]-  •  Lc 

_  1 
^  N 


Lc  = 


-J-'n 


1 


Gp  is  the  window  processing  gain  or  tlie  reciprocal  of  the  equivalent  noise  bandwidth,  Ls(u;) 
is  the  scalloping  loss,  and  Lc  is  the  loss  due  to  the  correlation  between  overlapped  DFT’s. 
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Example.  Suppose  we  are  trying  to  detect  a  transient  of  length  K  =  100  DFT’s  with  an 
input  SNR  per  cell  of  -5  dB.  A  Hamming  window  is  used  with  50%  overlap.  We  assume 
the  worst  case  scalloping  loss  so  that  u  =  tt/N .  From  [1,  table  1],  we  have  that  [2,3] 

lOlogjoG  =  —3.35  dB, 


so  that  the  effective  SNR  is 


/3  =  -8.35  dB. 


We  will  use  the  minimax  optimality  criterion  to  find  the  optimal  number  of  DFT’s  M* 
for  a  mean  time  between  false  alarms  Mf  =  10®  DFT’s.  To  use  Figure  8,  the  normalized 
SNR  and  mean  time  between  false  alarms  are  computed  by 

0'  =  0  +  101og,o(/\')  =  -8.35  +  20  =  11.65  dB 


Mf  =  Mf /lOO  =  10®  DFTs. 

From  Figure  8(b),  f**  =  0.55  so  tliat  M’  =  (0.55)(100)  =  55  DFT’s.  From  [1,2,3], 
/iQ  =  (To  =  0.397A’(T^  and  7o  =  TH-  The  threshold  h*  is  computed  from  (13), 


\MfJ 

sr  -  (^); 


= 

=  (3G.95):V<7-. 


5.5  Conclusion 


In  this  chapter,  the  problem  of  the  optimal  sample  size  or  integration  period  for  the 
detection  of  a  transient  signal  was  examined.  While  the  framework  for  the  discrete-time 
problem  was  more  general  than  that  for  continuous  time,  both  cases  relied  on  using  the 


Gaussian  distribution  for  the  test  statistics.  For  the  continuous-time  situation,  a  set  of 


curves  was  numericcilly  computed  which  indicate  the  optimal  integration  period  for  both 
the  mean  optimality  criterion  and  tlie  minimax  optimality  criterion.  A  design  procedure 
was  given  to  use  these  curves  for  transients  of  arbitrary  length  in  the  continuous-time  case 
and  transients  consisting  of  a  large  number  of  samples  in  the  discrete-time  case.  It  was 
shown  that  the  results  could  be  applied  to  the  detection  of  narrowband  transient  signals. 

Throughout  this  chapter,  it  has  been  assumed  that  the  duration  of  the  transient  to  be 
detected  is  known  to  a  high  degree  of  accuracy.  If  the  mechanics  of  the  phenomenon  which 
generates  the  transient  is  well  understood,  then  this  duration  may  be  known.  However, 
in  some  situations  only  an  estimate  is  available  or  the  duration  may  in  fact  be  a  random 
variable.  Now,  the  results  presented  in  this  chapter  are  no  longer  optimal  although  one  may 
still  want  to  use  the  “optimal”  sample  size  or  integration  period  for  the  average  transient 
duration.  In  such  a  situation,  this  work  serves  more  as  a  genered  guide  in  the  design  of  the 
detector.  It  demonstrates  that  the  integration  period  should  be  one  half  to  one  times  the 


true  transient  duration. 
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6 


Conclusion 


6.1  Summary  and  Future  Work 


This  dissertation  investigated  sequential  decision  procedures  to  detect  changes  in  the 
statistical  model  of  an  observed  random  process  when  these  changes  can  occur  at  unknown 
times.  Specifically,  the  disorder  problem  and  the  transient  problem  were  examined  with  the 
appropriate  performance  criteria  for  the  application.  Below,  the  results  of  this  study  are 
summarized  along  with  suggestions  for  further  research. 

In  Chapter  2,  Page’s  tost  was  generalized  by  considering  arbitrary  nonlinearities  for 
detecting  a  change  in  distribution.  A  simple  performance  measure  was  defined  which  corre¬ 
sponds  to  the  slope  of  the  performance  curves  for  a  large  mean  time  between  false  alarms. 
An  analytic  bound  was  found  for  this  measure  which  allows  the  performance  of  different 
nonlinearities  to  be  easily  compared.  Also,  by  considering  the  local  performance  of  Page’s 
test,  it  was  shown  that  the  measure  coincides  with  classical  detector  efficacy  in  the  or¬ 
dinary  binary  hypothesis  testing  situation.  These  results  relied  on  the  assumption  that 
the  observations  before  and  after  the  disorder  were  independent  and  identically  distributed 
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random  variables.  While  there  have  been  some  results  for  the  quickest  detection  problem 
with  dependent  observations  using  a  likelihood  ratio  approach,  there  has  been  no  work  on 
using  variants  of  Page’s  test  with  general  nonlinearities  for  the  dependent  case.  The  recent 
results  on  extending  Wald’s  fundamental  identity  to  certain  dependent  sequences  [3]  might 
provide  the  necessary  framework  for  the  development  of  a  performance  measure  for  Page’s 
test  using  dependent  obs3rvations  paralleling  the  work  in  Chapter  2. 

Chapter  3  considered  the  application  of  Page’s  test  to  the  detection  of  transient  signals. 
The  test  statistics  were  derived  from  the  DFT  of  the  observations  to  allow  for  the  detection 
of  a  change  in  the  energy  spectrum.  Several  other  sequential  tests  were  also  considered 
for  this  problem.  Using  Monte  Carlo  simulations,  the  performances  were  compared  and  it 
was  found  that  Page’s  test  performed  as  well  as  or  better  than  the  other  tests  in  nearly  all 
situations.  This  work  represents  the  first  time  that  the  performances  of  these  sequential 
tests  are  evaluated  in  the  context  of  transient  detection.  There  is  much  more  work  that 
could  be  done  on  the  analytic  evaluation  of  sequential  tests.  Theoretical  questions  such  as 
the  expected  hitting  times  of  moving  average  processes  could  be  investigated  to  provide  a 
more  complete  framework  for  analyzing  sequential  tests. 

The  Gabor  representation  was  investigated  in  Chapter  4.  It  was  shown  that  such 
a  representation  could  be  useful  in  detecting  transient  signals  in  noise.  It  was  further 
demonstrated  that  for  the  purposes  of  detection  and  estimation,  the  maximum  likelihood 
estimates  of  the  Gabor  coefficients  may  be  more  appropriate  to  use  than  those  obtained 
through  the  biorthogonaJ  function  procedure.  Several  issues  remain  unresolved,  namely, 
the  effects  of  window  mismatch,  delay  mismatch,  and  frequency  mismatch.  Moreover,  when 
mismatch  is  expected,  a  window  function  which  is  robust  over  an  uncertainty  class  of  possible 
signals  might  be  desirable,  similar  to  the  idea  of  robust  matched  filtering.  Also,  the  merits 
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of  this  approach  should  be  compared  to  more  traditional  methods  such  as  the  short  time 
Fourier  Transform  when  the  representation  does  not  exactly  match  the  actual  situation. 
One  might  also  want  to  investigate  the  relative  merits  of  the  Gabor  representation  and  the 
more  traditional  windowing  methods  for  the  DFT  used  to  enhance  resolution  [1].  Finally, 
the  wavelet  representation  [2]  has  recently  received  much  attention  as  an  alternative  to  the 
Gabor  representation  but  no  detection  schemes  have  been  formulated  and  evaluated  using 
wavelets. 

The  optimal  sample  size  for  a  transient  detector  was  explored  in  Chapter  5.  A  procedure 
was  given  to  determine  the  optimal  sample  size  or  integration  period  for  the  detection 
of  a  simple  transient  with  known  duration.  Further  work  could  be  done  on  robustifying 
the  detection  procedure  to  allow  for  arbitrary  transient  lengths.  Perhaps  this  could  be 
accomplished  by  using  detection  statistics  corresponding  to  different  time  resolutions  as 
might  result  from  utilizing  the  wav'elet  representation. 

As  a  whole,  this  dissertation  has  focused  on  sequential  detection  procedures  and  their 
evaluation  with  appropriate  performance  criteria.  Since  many  real  signal  processing  schemes 
are  implemented  in  a  sequential  fashion  but  are  usually  only  evaluated  based  on  fixed  sample 
size  performance,  this  is  an  area  that  warrants  further  study. 
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